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THREE NOVEL GENES ENCODING A ZINC FINGER PROTEIN, A GUANINE, NUCLEOTIDE EXCHANGE FACTOR 
AND A HEAT SHOCK PROTEIN OR HEAT SHOCK BINDING PROTEIN 

FIELD OF THE INVENTION 

5 The present invention relates generally to a novel human gene and its derivatives and to 
mammalian, animal, insect, nematodes, avian and microbial homologues thereof. The present 
invention further provides pharmaceutical compositions and diagnostic agents as well as genetic 
molecules useful in gene replacement therapy and recombinant molecules useful in protein 
replacement therapy. 

10 

BACKGROUND OF THE INVENTION 

Bibliographic details of the publications referred to by author in this specification are collected 
at the end of the description. 

15 

The increasing sophistication of recombinant DNA technology is greatly facilitating research and 
development in the medical and allied health fields. There is growing need to develop 
recombinant and genetic molecules for use in diagnosis and in conventional pharmaceutical 
preparations as well as in gene and protein replacement therapies. 

20 

In work leading up to the present invention, the inventors sought to identify and clone human 
genes which might be useful as potential diagnostic and/or therapeutic agents. Molecules of 
particular interest targeted by the inventors were gene regulators including regulatory proteins, 
signal transducers and heat shock proteins. 

25 

Gene expression generally requires interaction between a regulatory protein and an appropriate 
recognition sequence of a target gene. Regulatory proteins comprise in many cases a domain or 
motif which facilitates binding to DNA. One particular motif comprises small sequence units 
repeated in tandem with each unit folded about a zinc atom to form separate structural domains. 
30 This motif is now referred to as a zinc finger domain. Such a domain is generally defined by the 
number of cysteine (C) and histidine (H) residues. 
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In addition, knowledge of cellular interaction in the control of cell proliferation is essential in the 
rational design of specific therapeutic strategies aimed at controlling proliferative disorders. 
Such proliferative disorders including a range of cancers, inflammatory conditions and 
atherosclerosis. An important aspect of cellular interaction is in signal transduction via receptors 
5 to intracellular transducers. One key signal transducer is Ras which couples the receptors for 
diverse extracellular signals to different effectors. Ras directly activates the downstream kinase 
Raf which in turn induces the mitogen activated protein kinase (MAPK) cascade. 

Another regulatory mechanism involves heat shock proteins. The Escherichia coli heat shock 
10 protein, DnaJ, is the founding member of a family of proteins which are associated with protein 
folding, protein complex assembly and transit through subcellular components. 

Prokaryotic and eukaryotic DnaJ homologues have a modular organisation consisting of a J 
domain, a glycine-rich spacer, CXXCXGXG [SEQ ID NO: 1] repeats and a C-terminal region 
15 with no obvious sequence features, as well as additional sequences for protein targeting. The 
J domain is anticipated to mediate interaction with heat shock 70 proteins (Hsp70) and consists 
of some 70 amino acids, frequently located at the N-terminus of the protein. 

In accordance with the present invention, a genes have been identified from the human genome 
20 which encodes proteins having a regulatory role. One gene, in accordance with the present 
invention encodes a protein with an N-terminal region resembling a zinc-finger domain of a novel 
type. Another gene encodes a protein involved in guanine nucleotide exchange factor (GEF) 
signalling pathways. Yet another gene encodes a protein which is a heat shock protein or heat 
shock-like protein which may have a role in tumour suppression. 

25 

SUMMARY OF THE INVENTION 

Throughout this specification, unless the context requires otherwise, the word "comprise", or 
e variations such as "comprises" or "comprising", will be understood to imply the inclusion of a 
30 stated element or integer or group of elements or integers but not the exclusion of any other 
element or integer or group of elements or integers. 



WO 98/53061 



-3- 



PCT/AU98/00380 



Sequence identity numbers (SEQ ID NOs.) for nucleotide and amino acid sequences referred to 
in the subject specification are defined after the bibliography. A summary of SEQ ED NOs. is 
also given in Table 1. 

5 One aspect of the present invention contemplates an isolated nucleic acid molecule comprising 
a sequence of nucleotides encoding or complementary to a sequence encoding an amino acid 
sequence having homology to a regulator of gene expression or a derivative of said gene 
regulator. 

10 Another aspect of the present invention provides an isolated nucleic acid molecule comprising 
a sequence of nucleotides encoding or complementary to a sequence encoding a regulator of 
gene expression wherein said regulator comprises a zinc finger domain of an (HC 3 ) 2 type. 

Yet another aspect of the present invention is directed to an isolated nucleic acid molecule 
15 comprising a sequence of nucleotides or a complementary form thereof selected from: 

(i) a nucleotide sequence set forth in SEQ ID NO:2; 

(ii) a nucleotide sequence encoding an amino acid sequence set forth in SEQ ID NO:3; 

(iii) a nucleotide sequence having at least about 40% similarity to the nucleotide sequence 
20 of (i) or (ii); and 

(iv) a nucleotide sequence capable of hybridizing under low stringency conditions at 42°C 
to the nucleotide sequence set forth in (i), (ii) or (iii). 

The nucleotide sequence set forth in SEQ ID NO:2 defines the gene, mcg4. This gene encodes 
25 a product, MCG4, having an amino acid sequence set forth in SEQ ID NO:3. 

Even yet another aspect of the present invention provides a genetic construct comprising a vector 
portion and an animal, more particularly a mammalian and even more particularly a human mcg4 
gene portion, which mcg4 gene portion is capable of encoding an MCG4 polypeptide or a 
30 functional or immunologically interactive derivative thereof. 
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Still yet another aspect of the present invention contemplates a method of detecting a condition 
caused or facilitated by an aberration in mcg4 f said method comprising determining the presence 
of a single or multiple nucleotide substitution, deletion and/or addition or other aberration to one 
or both alleles of said mcg4 wherein the presence of such a nucleotide substitution, deletion 
5 and/or addition or other aberration may be indicative of said condition or a propensity to develop 
said condition. 

Even still a further aspect of the present invention relates to a method of detecting a condition 
caused or facilitated by an aberration in mcg4, said method comprising screening for a single or 
10 multiple amino acid substitution, deletion and/or addition to MCG4 wherein the presence of such 
a mutation is indicative of or a propensity to develop said condition. 

Another aspect of the present invention contemplates a method for detecting MCG4 or a 
derivative thereof in a biological sample said method comprising contacting said biological 
15 sample with an antibody specific for MCG4 or its derivatives or homologues for a time and under 
conditions sufficient for an antibody-MCG4 complex to form, and then detecting said complex. 

A further aspect of the present invention contemplates an isolated nucleic acid molecule 
comprising a sequence of nucleotides encoding or complementary to a sequence encoding an 
20 amino acid sequence having homology to a guanine nucleotide exchange factor (GEF) or a 
derivative thereof. 

Yet another aspect of the present invention is directed to an isolated nucleic acid molecule 
comprising a sequence of nucleotides or a complementary form thereof selected from: 



25 



(i) 
(ii) 



a nucleotide sequence set forth in SEQ ID NO:4 or 6; 



a nucleotide sequence encoding an amino acid sequence set forth in SEQ ID NO:5 
or 7; 



(Hi) 



a nucleotide sequence having at least about 40% similarity to the nucleotide sequence 
of (i) or (ii); and 



30 



(iv) 



a nucleotide sequence capable of hybridizing under low stringency conditions to the 
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nucleotide sequence set forth in (i), (ii) or (iii). 

The nucleotide sequence set forth in SEQ ID NO:4 or 6 defines the gene, mcg7. This gene 
encodes a product, MCG7, having an amino acid sequence set forth in SEQ ID NO:5 or 7. 

5 

Even yet another aspect of the present invention provides a genetic construct comprising a vector 
portion and an animal, more particularly a mammalian and even more particularly a human mcg7 
gene portion, which mcg7 gene portion is capable of encoding an MCG7 polypeptide or a 
functional or immunologically interactive derivative thereof. 

10 

Still yet another aspect of the present invention contemplates a method of detecting a condition 
caused or facilitated by an aberration in meg 7, said method comprising determining the presence 
of a single or multiple nucleotide substitution, deletion and/or addition or other aberration to one 
or both alleles of said mcg7 wherein the presence of such a nucleotide substitution, deletion 
15 and/or addition or other aberration may be indicative of said condition or a propensity to develop 
said condition. 

Even still a further aspect of the present invention relates to a method of detecting a condition 
caused or facilitated by an aberration in meg 7, said method comprising screening for a single or 
20 multiple amino acid substitution, deletion and/or addition to MCG7 wherein the presence of such 
a mutation is indicative of or a propensity to develop said condition. 

Another aspect of the present invention contemplates a method for detecting MCG7 or a 
derivative thereof in a biological sample said method comprising contacting said biological 
25 sample with an antibody specific for MCG7 or its derivatives or homologues for a time and under 
conditions sufficient for an antibody-MCG7 complex to form, and then detecting said complex. 

Yet another aspect of the present invention contemplates an isolated nucleic acid molecule 
comprising a sequence of nucleotides encoding or complementary to a sequence encoding an 
30 amino acid sequence having homology to a heat shock protein or a heat shock binding protein 
or a derivative thereof. 
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Another aspect of the present invention is directed to an isolated nucleic acid molecule 
comprising a sequence of nucleotides or a complementary form thereof selected from: 

(i) a nucleotide sequence set forth in SEQ ID NO:8; 

5 (ii) a nucleotide sequence encoding an amino acid sequence set forth in SEQ ID NO:9; 

(iii) a nucleotide sequence having at least about 40% similarity to the nucleotide sequence 
of (i) or (ii); and 

(iv) a nucleotide sequence capable of hybridizing under low stringency conditions at 41°C 
to the nucleotide sequence set forth in (i), (ii) or (iii). 

10 

The nucleotide sequence set forth in SEQ ID NO:8 defines the gene, mcgl8. This gene encodes 
a product, MCG18, having an amino acid sequence set forth in SEQ ID NO:7. 

Even yet another aspect of the present invention provides a genetic construct comprising a vector 
15 portion and an animal, more particularly a mammalian and even more particularly a human 
mcgl8 gene portion, which mcg!8 gene portion is capable of encoding an MCG18 polypeptide 
or a functional or immunologically interactive derivative thereof. 

Still yet another aspect of the present invention contemplates a method of detecting a condition 
20 caused or facilitated by an aberration in mcgl8, said method comprising determining the presence 
of a single or multiple nucleotide substitution, deletion and/or addition or other aberration to one 
or both alleles of said mcg!8 wherein the presence of such a nucleotide substitution, deletion 
and/or addition or other aberration may be indicative of said condition or a propensity to develop 
said condition. 

25 

Even still a further aspect of the present invention relates to a method of detecting a condition 
caused or facilitated by an aberration in mcgl8 t said method comprising screening for a single 
or multiple amino acid substitution, deletion and/or addition to MCG18 wherein the presence of 
such a mutation is indicative of or a propensity to develop said condition. 

30 

Another aspect of the present invention contemplates a method for detecting MCG18 or a 



i 
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derivative thereof in a biological sample said method comprising contacting said biological 
sample with an antibody specific for MCG18 or its derivatives or homologues for a time and 
under conditions sufficient for an antibody-MCG18 complex to form, and then detecting said 
complex. 

A summary of SEQ ID Nos. referred to in the subject specification is shown in Table 1. 
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TABLE 1 
SUMMARY OF SEQ ID Nos. 



5 SEQ ID NO. DESCRIPTION 



1 
1 


amino acid repeat sequence in DnaJ homologues 


L 


iNucieouae sequence or mcg*t 


*\ 
J 


drmno dciu sequence oi ivivAj'f 


A 


nucicoiiue sequence oi meg/ 


<; 


diiiino dciu sequence oi iviv^o / 


u 


nucieouue sequence oi meg/ wiuiin exon or 




nucleotides 183-288 


7 


amino acid sequence of MCG7 within exon of 




nucleotide 183-288 


8 


nucleotide sequence of meg 18 


9 


amino acid sequence of MCG18 


10-18 


amino acid sequence identified using BESTTTT 


19 


sequence of pGEX and mcg7 junction 


20 


sequence of pGEX and mcg7 junction 


21 


nucleotide sequence of myc-teg/mcg 7 junction 


22 


amino acid sequence corresponding to SEQ ID NO: 21 


23 


nucleotide sequence of pGEX and mcg7 junction 


24 


amino acid sequence corresponding to SEQ ED NO:23 


25-36 


meg 7-specific oligonucleotide 


37-45 


mcg/8-specific oligonucleotide 



25 r Single and three letter abbreviations for amino acid residues are shown in Table 2. 
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TABLE 2 



Amino Acid Three-letter One-letter 

Abbreviation Symbol 

5 



Alanine 

/VlCU Lulls 


Ala 


A 


A roininf* 
/AlgliLLilC 


Arg 


K 


A^nannjinp 


Acn 


XT 
IN 


rYoLrdlllL- aUlU 


Asp 


r\ 
U 








Glutflminf* 

VJlUuUllUll/ 


Gin 


r\ 
V 


Glutamic acid 


Glu 


F 


Glvcine 


Glv 


a 


T-TiQtiHinf* 


ma 


U 

n 


15 Isoleucine 


lie 


I 


Leucine 


Leu 


L 


Lysine 


Lys 


K 


Methionine 


Met 


M 


Phenylalanine 


Phe 


F 


20 Proline 


Pro 


P 


Serine 


Ser 


S 


Threonine 


Thr 


T 


Tryptophan 


Trp 


W 


Tyrosine 


Tyr 


Y 


25 Valine 


Val 


V 


Any residue 


Xaa 


X 



30 
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BRIEF DESCRIPTION OF THE FIGURES 

Figure 1 is a representation of the nucleotide sequence [SEQ ID NO:2] and corresponding 
amino acid sequence [SEQ ID NO:3] of mcg4. 

5 

Figure 2 is a representation of the alignment of the human MCG4 amino acid sequence with a 
translation of a partial murine expressed sequence tag (EST). 

Figure 3 is a representation of the alignment of the human MCG4 amino acid sequence with a 
10 translation of a partial nematode EST. 

Figure 4 is a diagrammatic representation showing a predicted structure of MCG4 where H and 
C represent histidine and cysteine residues, respectively and X refers to any amino acid residue. 
Zn represent zinc atoms. 

15 

Figure 5 is a representation of sensitive sequence homology search of related cysteine-containing 
motifs in another Caenorhabditis elegans protein. 

Figure 6 is a representation showing that a related cysteine containing motif is present in the 
20 GATA-binding transcription factor from Saccharomyces pombe. 

Figure 7 is a Northern blot showing expression of mcg4 in various cultured human cancer cell 
lines. Lanes 1-5, respectively, represent the hybridization signal from 15//g total RNA derived 
from various human cancer cell lines. Lanes 1-5, respectively, contain RNA from H69 lung 
25 carcinoma cells, JAM ovary carcinoma cells, BT20 breast carcinoma cells, HaCat transformed 
keratinocytes, T24 bladder carcinoma cells. 

Figure 8 is a representation of a partial alignment of mcg4 with human ESTs AA074703 and 
A A 134788. 

30 

Figure 9 is a representation of the partial nucleotide sequence alignment between a human 
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(W32939) and mouse (AA242159) mcg4 -like EST in the putative 5' UTR of the mcg4 cDNA. 
The putative initiation codon is underlined and the region upstream represents 5' UTR. 

Figure 10 is a representation showing Mac Vector alignment of MCG4 with forward translations 
5 of ESTs AA134788 and AA074703. The nucleotide sequences are shown in Figure 8. 

Figure 11 is a diagrammatic representation of the domains of MCG4 

zinc finger consensus: CX 2 HX 4 CX 2 CX 4 HX 2 CX t7 CX 2 CX 18 HX 2 CX 18 CX 2 C 
acidic domain consensus: 9/34 amino acids negatively charged, 0/34 positively charged 
10 basic domain consensus: 13/55 amino acids positively charged, 0/55 negatively charged 
leucine zipper domain consensus: LXgLXgRXgLJQL 

alternate "novel" leucine zipper-like motif where leucine would not be aligned along the one 
surface of an alpha helix domain: (aa261) LX 6 LXLX 6 LXLX 6 L (aa 286). 

15 Figure 12 is a representation showing similarity of MCG7 with GEFs of various organisms. 

Figure 13(a) is a representation of the nucleotide sequence [SEQ ID NO:4] and corresponding 
amino acid sequence [SEQ ID NO:5] of meg 7. Nucleotides 183-288 are an alternative spliced 
exon (shown in lower case). 

20 

Figure 13(b) is a representation of the partial nucleotide sequence [SEQ ID NO: 6] and 
corresponding amino acid sequence [SEQ ID NO:7] of mcg7 but without the exon shown in Fig. 
13(a). Amino acids have been numbered from the first methionine codon (underlined). The 
cDNA molecules of Fig. 13(a) and Fig. 13(b) differ by the inclusion and exclusion of the exon 
25 of nucleotides 183-288. 

Figure 14 is a representation showing a comparison between MCG7 and a homologue from 
Caenorhabditis elegans using the BESTFTT algorithm, in the figure, the following sequences 
are underlined: 

30 

EF-Hand= PROSITE DATABASE NO. PD0C00018 
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1 a nematode DVDEEDEVEDffiF [SEQ ID NO: 10] 
lb human DVDGDGHISQEEF [SEQ ID NO: 1 1] 

nematode DHDRDGFISQEEF [SEQ ID NO: 1 2] 
lc human DQNQDGCISREEM [SEQ ID NO: 13] 

5 nematode D VDMDGQIS KDEL [SEQ ID NO: 14] 

GUANINE NT BINDING REGION = BLOCKS DATABASE NO. BL00720B 

2 human HFVHVAEKUJQLQ^^FNTIJvlAWGGLSHSSISRLKETH[SEQIDNO: 15] 
nematode KFVHVAKHLRKINNFNTLMSVVGGITHSSVARLAKTY 

10 [SEQ ID NO: 16] 

DaG-PE BINDING DOMAIN = PROSITE DATABASE NO. PD0C00379 

3 human HNFQESNSLRPVACRHCKALILGIYKQGLKCRACGVNCHKQCKDRLSVEC 

[SEQ ID NO: 17] 

15 nematode H>IFHETrFLTPTTCNHCNKLLWGILRQGFKCKDCGLAVHSCCKSNAVAEC 
[SEQ ID NO: 18] 

Figure 15 is a representation of an alignment of human and a partial (5 ' UTR and partial coding 
sequence) murine mcgl cDNA (GenBank Acc. No. W71787 and AA237373). The putative 
20 initiation codon is underlined. The murine sequence represents a composite of 2 partial cDNA 
sequences from the EST database (accession numbers W71787 and AA237373). Nucleotide 
differences between human and murine sequences are shown in lower case lettering and identical 
residues are indicated with asterisks. 

25 Figure 16 is a representation of further 5' nucleotide and corresponding amino acid sequence for 
human mcgl. Nucleotide positions 1-321 were derived from GenBank Acc. No. AC000134 and 
nucleotides 322 onwards from Fig. 13(a). Two in-frame initiation codons are underlined. 
Asterisks denote in-frame stop codons. 

30 Figure 17 is a graphical representation of a GDP release assay. □ Experiment #1 (mean of 
duplicates). 0 Experiment #2 (mean of duplicates). The exchange reaction contained 36pmols 
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of GST-MCG (N-terminally truncated; encoded by Construct B in Fig. 18) and 1.6-12.8 pmols 
of recombinant GST-N-Ras.GDP. Reaction time 6 mins. 
Estimated reaction constants: 
1^ = 2.1pM, = 37pMol/6min/36pMol [Expt#l] 
5 1^= 1.5nM f = 30.3pMol/6 min/36pMol [Expt#2] 

Figure 18 depicts various recombinant plasmids containing partial or full-length mcgZ 

Figure 19 is a representation of the nucleotide sequence [SEQ ID NO:8] and corresponding 
10 amino acid sequence [SEQ ID NO:9] of meg 18. 

Figure 20 is a representation showing that MCG18 has partial homology to E. coli DnaJ. 

Figure 21 is a representation showing that MCG18 has homology to two Caenorhabitis elegans 
15 proteins. 

Figure 22 is a representation showing that MCG18 has homology to a Saccharomyces pombe 
protein. 

20 Figure 23 is a representation showing homology of MCG18 to a Drosophila virilis protein. 

Figure 24 is a representation showing homology of MCG18 to human DnaJ proteins HDJ- 
2/HSDJ, HDJ-1/HSP40 and HSJ1. 

25 Figure 25 is a representation of the nucleotide and corresponding amino acid sequence of murine 
mcgl8. 

Figure 26 is a representation of homology between human and murine MCG18. 

30 Figure 27 depicts nucleotide sequences corresponding to the 5' untranslated region of human 
mcg!8. 
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Figure 28 depicts a Northern blot showing expression of mcgl8 transcripts in total RNA isolated 
from various human cancer cell lines grown in culture. Lanes 1-5 respectively contain 15/zg 
RNA from H69 lung carcinoma cells, JAM ovary carcinoma cells, BT20 breast carcinoma cells, 
HaCat transformed keratinocytes, T24 bladder carcinoma cells. 
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DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENT 

The present invention provides an isolated nucleic acid molecule comprising a sequence of 
nucleotides encoding or complementary to a sequence encoding an amino acid sequence having 
5 homology to a regulator of gene expression or a derivative of said gene regulator. 

More particularly, the present invention is directed to an isolated nucleic acid molecule 
comprising a sequence of nucleotides encoding or complementary to a sequence encoding a 
regulator of gene expression wherein said regulator comprises a zinc finger domain of an (HC 3 ) 2 
10 type. 

Still more particularly, the present invention provides an isolated nucleic acid molecule 
comprising a sequence of nucleotides or a complementary form thereof selected from: 

15 (i) a nucleotide sequence set forth in SEQ ID NO:2; 

(ii) a nucleotide sequence encoding an amino acid sequence set forth in SEQ ID NO:3; 

(iii) a nucleotide sequence having at least about 40% similarity to the nucleotide sequence 
of (i) or (ii); and 

(iv) a nucleotide sequence capable of hybridizing under low stringency conditions at 42°C 
20 to the nucleotide sequence set forth in (i), (ii) or (iii). 

The present invention also provides an isolated nucleic acid molecule comprising a sequence of 
nucleotides encoding or complementary to a sequence encoding an amino acid sequence having 
homology to a guanine nucleotide exchange factor (GEF) or a derivative thereof. 

25 

More particularly, the present invention is directed to an isolated nucleic acid molecule 
comprising a sequence of nucleotides or a complementary form thereof selected from: 

(i) a nucleotide sequence set forth in SEQ ID NO:4 or 6; 

30 (ii) a nucleotide sequence encoding an amino acid sequence set forth in SEQ ID NO:5 

or 7; 
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(iii) a nucleotide sequence having at least about 40% similarity to the nucleotide sequence 
of (i) or (ii); and 

(iv) a nucleotide sequence capable of hybridizing under low stringency conditions at 42°C 
to the nucleotide sequence set forth in (i), (ii) or (iii). 

5 

Another aspect of the present invention contemplates an isolated nucleic acid molecule 
comprising a sequence of nucleotides encoding or complementary to a sequence encoding an 
amino acid sequence having homology to a heat shock protein or a heat shock-binding protein 
or a derivative thereof. 

10 

More particularly, the present invention is directed to an isolated nucleic acid molecule 
comprising a sequence of nucleotides or a complementary form thereof selected from: 

(i) a nucleotide sequence set forth in SEQ ED NO:8; 

15 (ii) a nucleotide sequence encoding an amino acid sequence set forth in SEQ ID NO:9; 

(iii) a nucleotide sequence having at least about 40% similarity to the nucleotide sequence 
of (i) or (ii); and 

(iv) a nucleotide sequence capable of hybridizing under low stringency conditions at 42°C 
to the nucleotide sequence set forth in (i), (ii) or (iii). 

20 

Preferably, the percentage similarity is at least about 50%. More preferably, the percentage 
similarity is at least about 60%. 

Reference herein to a low stringency at 42°C includes and encompasses from at least about 1% 
25 v/v to at least about 15% v/v formamide and from at least about 1M to at least about 2M salt for 
hybridisation, and at least about 1M to at least about 2M salt for washing conditions. Alternative 
stringency conditions may be applied where necessary, such as medium stringency, which 
includes and encompasses from at least about 16% v/v to at least about 30% v/v formamide and 
from at least about 0.5M to at least about 0.9M salt for hybridisation, and at least about 0.5M 
30 to at least about 0.9M salt for washing conditions, or high stringency, which includes and 
encompasses from at least about 31% v/v to at least about 50% v/v formamide and from at least 
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about 0.01M to at least about 0. 15M salt for hybridisation, and at least about 0.0 1M to at least 
about 0. 15M salt for washing conditions. 

The term "similarity" as used herein includes exact identity between compared sequences at the 
5 nucleotide or amino acid level. Where there is non-identity at the nucleotide level, "similarity" 
includes differences between sequences which result in different amino acids that are nevertheless 
related to each other at the structural, functional, biochemical and/or conformational levels. 
Where there is non-identity at the amino acid level, "similarity" includes amino acids that are 
nevertheless related to each other at the structural, functional, biochemical and/or conformational 
10 levels. 

The present invention extends to nucleic acid molecules with percentage similarities of 
approximately 65%, 70%, 75%, 80%, 85%, 90% or 95% or above or a percentage in between. 

15 The nucleic acid molecule of the present invention defined by SEQ ID NO:2 is hereinafter 
referred to as constituting the "mcg4" gene. The protein encoded by mcg4 is referred to herein 
as "MCG4 M and has an amino acid sequence set forth in SEQ ID NO:3. The mcg4 gene is 
proposed to encode, in accordance with the present invention, a regulator of gene expression and 
comprises a novel zinc finger domain, (HC 3 ) 2 . A regulator of gene expression includes a 

20 transcription factor. Regulation may be at the level of nucleic acid:protein or protein: protein 
interaction. 

The nucleic acid molecule of the present invention defined by SEQ ID NO:4 or 6 is hereinafter 
referred to as constituting the "mcgT gene. The protein encoded by mcg7 is referred to herein 
25 as "MCG7" and has an amino acid sequence set forth in SEQ ID NO:5 or 7 and is involved in 
signal transduction. The difference in the nucleotide and amino acid sequence is due to the 
presence or absence of an exon at nucleotides 183-288. 

The nucleic acid molecule of the present invention defined by SEQ ID NO:8 is hereinafter 
30 referred to as constituting the "mcglS" gene. The protein encoded by mcg!8 is referred to 
herein as M MCG18" and comprises the amino acid set forth in SEQ ID NO:9. 
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The present invention extends to the naturally occurring genomic mcg4 t mcg7 and meg 18 
nucleotide sequences or corresponding cDNA sequences or to derivatives thereof. Derivatives 
contemplated in the present invention include fragments, parts, portions, mutants, homologues 
and analogues of MCG4, MCG7 or MCG8 or the corresponding genetic sequences. Derivatives 

5 also include single or multiple amino acid substitutions, deletions and/or additions to MCG4, 
MCG7 or MCG18 or single or multiple nucleotide substitutions, deletions and/or additions to 
mcg4, mcg7 or mcgl8. "Additions" to the amino acid or nucleotide sequences include fusions 
with other peptides, polypeptides or proteins or fusions to nucleotide sequences. Reference 
herein to "MCG4" or "mcg4", "MCG7" or "mcg7" or "MCG8" or mcgl8" includes reference to 

10 all derivatives thereof including functional derivatives and immunologically interactive derivatives 
of MCG4, MCG7 or MCG18. 

The mcg4, mcg7 and mcg!8 of the present invention are particularly exemplified herein from 
humans and in particular from human chromosome 1 lql3. 

15 

The present invention extends, however, to a range of homologues from, for example, primates, 
livestock animals (eg. sheep, cows, horses, donkeys, pigs), companion animals (eg. dogs, cats) 
laboratory test animals (eg. rabbits, mice, rats, guinea pigs), reptiles, birds (eg. chickens, ducks, 
geese, parrots), insects, nematodes, eukaryotic microorganisms and captive wild animals (eg. 
20 deer, foxes, kangaroos). Reference herein to mcg4 and mcg!8 or their respective proteins 
MCG4, MCG7 and MCG18 includes reference to these molecules of human origin as well as 
novel forms of non-human origin. 

The nucleic acid molecules of the present invention may be DNA or RNA. When the nucleic 
25 acid molecule is in DNA form, it may be genomic DNA or cDNA. RNA forms of the nucleic 
acid molecules of the present invention arc generally mRNA. 

Although the nucleic acid molecules of the present invention are generally in isolated form, they 
may be integrated into or ligated to or otherwise fused or associated with other genetic 
30 molecules such as vector molecules and in particular expression vector molecules. Vectors and 
expression vectors are generally capable of replication and, if applicable, expression in one or 
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both of a prokaryotic cell or a eukaryotic cell. Preferably, prokaryotic cells include E. coli, 
Bacillus sp and Pseudomonas sp. Preferred eukaryotic cells include yeast, fungal, mammalian 
and insect cells. 

5 Accordingly, another aspect of the present invention contemplates a genetic construct comprising 
a vector portion and an animal, more particularly a mammalian and even more particularly a 
human mcg4 gene portion, which mcg4 gene portion is capable of encoding an MCG4 
polypeptide or a functional or immunologically interactive derivative thereof. 

10 Preferably, the mcg4 gene portion of the genetic construct is operably linked to a promoter in 
the vector such that said promoter is capable of directing expression of said mcg4 gene portion 
in an appropriate cell. 

In addition, the mcg4 gene portion of the genetic construct may comprise all or part of the gene 
15 fused to another genetic sequence such as a nucleotide sequence encoding glutathione-S- 
transferase or part thereof. 

The present invention extends to such genetic constructs and to prokaryotic or eukaryotic cells 
comprising same. 

20 

It is proposed in accordance with the present invention that MCG4 is a transcription factor 
involved in gene regulation. Mutations in mcg4 may result in aberrations in gene regulation 
leading to the development of or a propensity to develop various types of cancer. In this regard, 
although not wishing to limit the present invention to any one hypothesis or mode of action, it 
25 is proposed that mcg4 or its expression product may be involved in the tissue-specific or 
temporal regulation of particular genes. 

A. deletion or aberration in the mcg4 gene may also be important in the detection of cancer or 
a propensity to develop cancer. An aberration may be a homozygous mutation or a 
30 heterozygous mutation. The detection may occur at the foetal or post-natal level. Detection 
may also be at the germline or somatic cell level. Furthermore, a risk of developing cancer may 
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be determined by assaying for aberrations in the parents and/or proband of a subject under 
investigation. 

According to this aspect of the present invention, there is contemplated a method of detecting 
5 a condition caused or facilitated by an aberration in mcg4, said method comprising determining 
the presence of a single or multiple nucleotide substitution, deletion and/or addition or other 
aberration to one or both alleles of said mcg4 wherein the presence of such a nucleotide 
substitution, deletion and/or addition or other aberration may be indicative of said condition or 
a propensity to develop said condition. 

10 

Another aspect of the present invention contemplates a genetic construct comprising a vector 
portion and an animal, more particularly a mammalian and even more particularly a human mcg7 
gene portion, which mcg7 gene portion is capable of encoding an mcg7 polypeptide or a 
functional or immunologically interactive derivative thereof. 

15 

Preferably, the mcg7 gene portion of the genetic construct is operably linked to a promoter on 
the vector such that said promoter is capable of directing expression of said mcg7 gene portion 
in an appropriate cell. 

20 In addition, the mcg7 gene portion of the genetic construct may comprise all or part of the gene 
fused to another genetic sequence such as a nucleotide sequence encoding glutathione-S- 
transferase or part thereof. 

The present invention extends to such genetic constructs and to prokaryotic or eukaryotic cells 
25 comprising same. 

It is proposed in accordance with the present invention that MCG7 is a GEF involved in signal 
transduction. Mutations in mcg7 or MCG7 may result in defective control of cell proliferation 
leading to the development of or a propensity to develop various types of cancer. 

30 

A deletion or aberration in the mcg7 gene may also be important in the detection of cancer or 
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a propensity to develop cancer. An aberration may be a homozygous mutation or a 
heterozygous mutation. The detection may occur at the foetal or post-natal level. Detection 
may also be at the germline or somatic cell level. Furthermore, a risk of developing cancer may 
be determined by assaying for aberrations in the parents of a subject under investigation. 

5 

According to this aspect of the present invention, there is contemplated a method of detecting 
a condition caused or facilitated by an aberration in mcg7 t said method comprising determining 
the presence of a single or multiple nucleotide substitution, deletion and/or addition or other 
aberration to one or both alleles of said mcg7 wherein the presence of such a nucleotide 
10 substitution, deletion and/or addition or other aberration may be indicative of said condition or 
a propensity to develop said condition. 

Yet another aspect of the present invention contemplates a genetic construct comprising a vector 
portion and an animal, more particularly a mammalian and even more particularly a human 
15 meg 18 gene portion, which meg 18 gene portion is capable of encoding an MCG18 polypeptide 
or a functional or immunologically interactive derivative thereof. 

Preferably, the mcgl8 gene portion of the genetic construct is operably linked to a promoter on 
the vector such that said promoter is capable of directing expression of said mcg!8 gene portion 
20 in an appropriate cell. 

In addition, the megl8 gene portion of the genetic construct may comprise all or part of the gene 
fused to another genetic sequence such as a nucleotide sequence encoding glutathione-S- 
transferase or part thereof. 

25 

The present invention extends to such genetic constructs and to prokaryotic or eukaryotic cells 
comprising same. 

It is proposed in accordance with the present invention that MCG18 is a transcription factor 
30 involved in protein folding, protein complex assembly and transit through subcellular 
compartments. MCG18 may also have a role in tumour suppression. Thus mutations in meg 18 
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may result in the development of or a propensity to develop various types of cancer. 

A deletion or aberration in the mcgl8 gene may also be important in the detection of cancer or 
a propensity to develop cancer. An aberration may be a homozygous mutation or a 
5 heterozygous mutation. The detection may occur at the foetal or post-natal level. Detection 
may also be at the germline or somatic cell level. Furthermore, a risk of developing cancer may 
be determined by assaying for aberrations in the parents and/or proband of the subject under 
investigation. 

10 According to this aspect of the present invention, there is contemplated a method of detecting 
a condition caused or facilitated by an aberration in mcgl8 t said method comprising determining 
the presence of a single or multiple nucleotide substitution, deletion and/or addition or other 
aberration to one or both alleles of said mcgl8 wherein the presence of such a nucleotide 
substitution, deletion and/or addition or other aberration may be indicative of said condition or 

15 a propensity to develop said condition. 

The nucleotide substitutions, additions or deletions may be detected by any convenient means 
including nucleotide sequencing, restriction fragment length polymorphism (RFLP), polymerase 
chain reaction (PCR), oligonucleotide hybridization and single stranded conformation 
20 polymorphism analysis (SSCP) amongst many others. An aberration includes modification to 
existing nucleotides such as to modify glycosylation signal amongst other effects. 

In an alternative method, aberrations in the mcg4, mcg7 and mcgl8 genes are detected by 
screening for mutations in MCG4, MCG7 and MCG18, respectively. 

25 

A mutation in MCG4, MCG7 or MCG18 may be a single or multiple amino acid substitution, 
addition and/or deletion. The mutation in mcg4, mcg7 or mcgl8 may also result in either no 
translation product being produced or a product in truncated form. A mutant may also be an 
altered glycosylation pattern or the introduction of side chain modifications to amino acid 
30 residues. 
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According to this aspect of the present invention, there is provided a method of detecting a 
condition caused or facilitated by an aberration in mcg4, mcg7 or mcgl8 said method comprising 
screening for a single or multiple amino acid substitution, deletion and/or addition to MCG4, 
MCG7 or MCG18 wherein the presence of such a mutation is indicative of or a propensity to 
5 develop said condition. 

A particularly convenient means of detecting a mutation in MCG4, MCG7 or MCG18 is by use 
of antibodies. 

10 Accordingly another aspect of the present invention is directed to antibodies to MCG4, MCG7 
or MCG18 and its derivatives. Such antibodies may be monoclonal or polyclonal and may be 
selected from naturally occurring antibodies to MCG4, MCG7 or MCG18 or may be specifically 
raised to MCG4, MCG7 or MCG18 or derivatives thereof. In the case of the latter, MCG4, 
MCG7 or MCG18 or their derivatives may first need to be associated with a carrier molecule. 

15 The antibodies to MCG4, MCG7 or MCG18 of the present invention are particularly useful as 
diagnostic agents. 

For example, antibodies to MCG4, MCG7 or MCG18 and their derivatives can be used to screen 
for wild-type MCG4, MCG7 or MCG18 or for mutated MCG4, MCG7 or MCG18 molecules. 

20 The latter may occur, for example, during or prior to certain cancer development. A differential 
binding assay is also particularly useful. Techniques for such assays are well known in the art 
and include, for example, sandwich assays and ELISA. Knowledge of normal MCG4, MCG7 
or MCG18 levels or the presence of wild-type MCG4, MCG7 or MCG18 may be important for 
diagnosis of certain cancers or a predisposition for development of cancers or for monitoring 

25 certain therapeutic protocols. 

As stated above antibodies to MCG4, MCG7 or MCG18 of the present invention may be 
monoclonal or polyclonal or may be fragments of antibodies such as Fab fragments. 
Furthermore, the present invention extends to recombinant and synthetic antibodies and to 
30 antibody hybrids. A "synthetic antibody" is considered herein to include fragments and hybrids 
of antibodies. 
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For example, specific antibodies can be used to screen for wild-type MCG4, MCG7 or MCG18 
molecule or specific mutant molecules such as molecules having a certain deletion. This would 
be important, for example, as a means for screening for levels of MCG4, MCG7 or MCG18 in 
a cell extract or other biological fluid or purifying MCG4, MCG7 or MCG18 made by 
5 recombinant means from culture supernatant fluid or purified from a cell extract. Techniques for 
the assays contemplated herein are known in the art and include, for example, sandwich assays 
and ELISA. 

It is within the scope of this invention to include any second antibodies (monoclonal, polyclonal 
10 or fragments of antibodies or synthetic antibodies) directed to the first mentioned antibodies 
discussed above. Both the first and second antibodies may be used in detection assays or a first 
antibody may be used with a commercially available anti-immunoglobulin antibody. An antibody 
as contemplated herein includes any antibody specific to any region of wild-type MCG4, MCG7 
or MCG18 or to a specific mutant phenotype or to a deleted or otherwise altered region. 

15 

Both polyclonal and monoclonal antibodies are obtainable by immunization of a suitable animal 
or bird with MCG4, MCG7 or MCG18 or its derivatives and either type is utilizable for 
immunoassays. The methods of obtaining both types of sera are well known in the art. 
Polyclonal sera are less preferred but are relatively easily prepared by injection of a suitable 
20 laboratory animal or bird with an effective amount of MCG4, MCG7 or MCG18 or antigenic 
parts thereof or derivatives thereof, collecting serum from the animal or bird, and isolating 
specific sera by any of the known immunoadsorbent techniques. Although antibodies produced 
by this method are utilizable in virtually any type of immunoassay, they are generally less 
favoured because of the potential heterogeneity of the product. 

25 

The use of monoclonal antibodies in an immunoassay is particularly preferred because of the 
ability to produce them in large quantities and the homogeneity of the product. The preparation 
of hybridoma cell lines for monoclonal antibody production derived by fusing an immortal cell 
line and lymphocytes sensitized against the immunogenic preparation can be done by techniques 
30 which are well known to those who are skilled in the art. 
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Another aspect of the present invention contemplates a method for detecting MCG4, MCG7 or 
MCG18 or a derivative thereof in a biological sample said method comprising contacting said 
biological sample with an antibody specific for MCG4, MCG7 or MCG18 or its derivatives or 
homologues for a time and under conditions sufficient for an antibody-MCG4, MCG7 or 
5 MCG18 complex to form, and then detecting said complex. 

Preferably, the biological sample is a cell extract from a human or other animal or a bird. 

The presence of MCG4, MCG7 or MCG18 may be accomplished in a number of ways such as 
10 by Western blotting and ELISA procedures. A wide range of immunoassay techniques are 
available as can be seen by reference to US Patent Nos. 4,016,043, 4, 424,279 and 4,018,653. 
These include both single-site and two-site or "sandwich" assays of the non-competitive types, 
as well as traditional competitive binding assays. These assays also include direct binding of a 
labelled antibody to a target. 

15 

Sandwich assays are among the most useful and commonly used assays and are favoured for use 
in the present invention. A number of variations of the sandwich assay technique exist, and all 
are intended to be encompassed by the present invention. Briefly, in a typical forward assay, an 
unlabelled antibody is immobilized on a solid substrate and the sample to be tested brought into 

20 contact with the bound molecule. After a suitable period of incubation, for a period of time 
sufficient to allow formation of an antibody-antigen complex, a second antibody specific to the 
antigen, labelled with a reporter molecule capable of producing a detectable signal is then added 
and incubated, allowing time sufficient for the formation of another complex of antibody-antigen- 
labelled antibody. Any unreacted material is washed away, and the presence of the antigen is 

25 determined by observation of a signal produced by the reporter molecule. The results may either 
be qualitative, by simple observation of the visible signal, or may be quantitated by comparing 
with a control sample containing known amounts of hapten. Variations on the forward assay 
include a simultaneous assay, in which both sample and labelled antibody are added 
simultaneously to the bound antibody. These techniques are well known to those skilled in the 

30 ort. including any minor variations as will be readily apparent. In accordance with the present 
invention the sample is one which might contain MCG4, MCG7 or MCG18 including cell extract 
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or tissue biopsy. The sample is, therefore, generally a biological sample comprising biological 
fluid but also extends to fermentation fluid and supernatant fluid such as from a cell culture. 

In the typical forward sandwich assay, a first antibody having specificity for the MCG4, MCG7 
5 or MCG 1 8 or an antigenic part thereof or a derivative thereof or antigenic parts thereof, is either 
covalently or passively bound to a solid surface. The solid surface is typically glass or a polymer, 
the most commonly used polymers being cellulose, polyacrylamide, nylon, polystyrene, polyvinyl 
chloride or polypropylene. The solid supports may be in the form of tubes, beads, discs of 
microplates, or any other surface suitable for conducting an immunoassay. The binding 

10 processes are well-known in the art and generally consist of cross-linking covalently binding or 
physically adsorbing, the polymer-antibody complex is washed in preparation for the test sample. 
An aliquot of the sample to be tested is then added to the solid phase complex and incubated for 
a period of time sufficient (e.g. 2-40 minutes or overnight if more convenient) and under suitable 
conditions (e.g. from room temperature to 37 °C) to allow binding of any subunit present in the 

15 antibody. Following the incubation period, the antibody subunit solid phase is washed and dried 
and incubated with a second antibody specific for a portion of the hapten. The second antibody 
is linked to a reporter molecule which is used to indicate the binding of the second antibody to 
the hapten. 

20 An alternative method involves immobilizing the target molecules in the biological sample and 
then exposing the immobilized target to specific antibody which may or may not be labelled with 
a reporter molecule. Depending on the amount of target and the strength of the reporter 
molecule signal, a bound target may be detectable by direct labelling with the antibody. 
Alternatively, a second labelled antibody, specific to the first antibody is exposed to the target- 

25 first antibody complex to form a target-first antibody-second antibody tertiary complex. The 
complex is detected by the signal emitted by the reporter molecule. 

By "reporter molecule" as used in the present specification, is meant a molecule which, by its 
chemical nature, provides an analytically identifiable signal which allows the detection of antigen- 
30 bound antibody. Detection may be either qualitative or quantitative. The most commonly used 
reporter molecules in this type of assay are either enzymes, fluorophores or radionuclide 
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containing molecules (i.e. radioisotopes) and chemiluminescent molecules. 
In the case of an enzyme immunoassay, an enzyme is conjugated to the second antibody, 
generally by means of glutaraldehyde or periodate. As will be readily recognized, however, a 
wide variety of different conjugation techniques exist, which are readily available to the skilled 
5 artisan. Commonly used enzymes include horseradish peroxidase, glucose oxidase, beta- 
galactosidase and alkaline phosphatase, amongst others. The substrates to be used with the 
specific enzymes are generally chosen for the production, upon hydrolysis by the corresponding 
enzyme, of a detectable colour change. Examples of suitable enzymes include alkaline 
phosphatase and peroxidase. It is also possible to employ fluorogenic substrates, which yield a 

10 fluorescent product rather than the chromogenic substrates noted above. In all cases, the 
enzyme-labelled antibody is added to the first antibody hapten complex, allowed to bind, and 
then the excess reagent is washed away. A solution containing the appropriate substrate is then 
added to the complex of antibody-antigen-antibody. The substrate will react with the enzyme 
linked to the second antibody, giving a qualitative visual signal, which may be further quantitated, 

15 usually spectrophotometrically, to give an indication of the amount of hapten which was present 
in the sample. "Reporter molecule" also extends to use of cell agglutination or inhibition of 
agglutination such as red blood cells on latex beads, and the like. 

Alternately, fluorescent compounds, such as fluorescein and rhodamine, may be chemically 
20 coupled to antibodies without altering their binding capacity. When activated by illumination 
with light of a particular wavelength, the fluorochrome-labelled antibody adsorbs the light 
energy, inducing a state to excitability in the molecule, followed by emission of the light at a 
characteristic colour visually detectable with a light microscope. As in the EIA, the fluorescent 
labelled antibody is allowed to bind to the first antibody-hapten complex. After washing off the 
25 unbound reagent, the remaining tertiary complex is then exposed to the light of the appropriate 
wavelength the fluorescence observed indicates the presence of the hapten of interest. 
Immunofluorescence and EIA techniques are both very well established in the art and are 
particularly preferred for the present method. However, other reporter molecules, such as 
radioisotope, chemiluminescent or bioluminescent molecules, may also be employed. 

30 

As stated above, the present invention extends to genetic constructs capable of encoding MCG4, 
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MCG7 or MCG18 or functional derivatives thereof. Such genetic constructs are also 
contemplated to be useful in modulating expression of specific genes in which mcg4, mcg7 or 
mcgl8 is involved in tissue-specific or temporal regulation. 

5 Accordingly, another aspect of the present invention is directed to a genetic construct comprising 
a nucleotide sequence encoding a peptide, polypeptide or protein and mcg4 t mcg7 or mcg!8 or 
a functional derivative or homologue thereof capable of modulating the expression of said 
nucleotide sequence. 

10 As stated above, MCG18 is proposed to have a role in tumour suppression. Accordingly, it is 
further proposed in accordance with the present invention to use recombinant MCG18 in 
pharmaceutical preparations for treating arresting or otherwise ameliorating the effects of certain 
cancers. 

15 Accordingly, another aspect of the present invention contemplates a method for treating, 
arresting or otherwise ameliorating the effects of a cancer in an animal or bird, said method 
comprising administering to said animal or bird an effective amount of MCG18 or a functional 
derivative thereof for a time and under conditions sufficient to treat, arrest or otherwise 
ameliorate the effects of said cancer. 

20 

The present invention, therefore, contemplates a pharmaceutical composition comprising 
MCG18 or a derivative thereof or a modulator of meg 18 expression or MCG18 activity and one 
or more pharmaceutically acceptable carriers and/or diluents. These components are referred 
to hereinafter as the "active ingredients". The active ingredients may also include anti-cancer 
25 agents or agents which facilitate actions of MCG18. 

The pharmaceutical forms suitable for injectable use include sterile aqueous solutions (where 
water soluble) and sterile powders for the extemporaneous preparation of sterile injectable 
solutions. It must be stable under the conditions of manufacture and storage and must be 
30 preserved against the contaminating action of microorganisms such as bacteria and fungi. The 
carrier may be a solvent medium containing, for example, water, ethanol, polyol (for example, 
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glycerol, propylene glycol and liquid polyethylene glycol and the like), suitable mixtures thereof, 
and vegetable oils. The proper fluidity can be maintained, for example, by the use of a coating 
such as Iicithin and by the use of superfactants. The preventions of the action of microorganisms 
can be brought about by various antibacterial and antifungal agents, for example, parabens, 
5 chlorobutanol, phenol, sorbic acid, thimersal and the like. In many cases, it will be preferable to 
include isotonic agents, for example, sugars or sodium chloride. Prolonged absorption of the 
injectable compositions can be brought about by the use in the compositions of agents delaying 
absorption, for example, aluminum monostearate and gelatin. 

10 Sterile injectable solutions are prepared by incorporating the active compounds in the required 
amount in the appropriate solvent with various of the other ingredients enumerated above, as 
required, followed by filtered sterilization. In the case of sterile powders for the preparation of 
sterile injectable solutions, the preferred methods of preparation are vacuum drying and the 
freeze-drying technique which yield a powder of the active ingredient plus any additional desired 

15 ingredient from previously sterile-filtered solution thereof. 

When the active ingredients are suitably protected they may be orally administered, for example, 
with an inert diluent or with an assimilable edible carrier, or it may be enclosed in hard or soft 
shell gelatin capsule, or it may be compressed into tablets, or it may be incorporated directly with 

20 the food of the diet. For oral therapeutic administration, the active compound may be 
incorporated with excipients and used in the form of ingestible tablets, buccal tablets, troches, 
capsules, elixirs, suspensions, syrups, wafers, and the like. Such compositions and preparations 
should contain at least 1% by weight of active compound. The percentage of the compositions 
and preparations may, of course, be varied and may conveniently be between about 5 to about 

25 80% of the weight of the unit. The amount of active compound in such therapeutically useful 
compositions in such that a suitable dosage will be obtained. Preferred compositions or 
preparations according to the present invention are prepared so that an oral dosage unit form 
contains between about 0. 1 ^g and 2000 mg of active compound. 

30 The tablets, troches, pills, capsules and the like may also contain the components as listed 
hereafter. A binder such as gum, acacia, corn starch or gelatin; excipients such as dicalcium 
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phosphate; a disintegrating agent such as corn starch, potato starch, alginic acid and the like; 
a lubricant such as magnesium stearate; and a sweetening agent such a sucrose, lactose or 
saccharin may be added or a flavouring agent such as peppermint, oil of wintergreen, or cherry 
flavouring. When the dosage unit form is a capsule, it may contain, in addition to materials of 
5 the above type, a liquid carrier. Various other materials may be present as coatings or to 
otherwise modify the physical form of the dosage unit. For instance, tablets, pills, or capsules 
may be coated with shellac, sugar or both. A syrup or elixir may contain the active compound, 
sucrose as a sweetening agent, methyl and propylparabens as preservatives, a dye and flavouring 
such as cherry or orange flavour. Of course, any material used in preparing any dosage unit form 
10 should be pharmaceutically pure and substantially non-toxic in the amounts employed. In 
addition, the active compound(s) may be incorporated into sustained-release preparations and 
formulations. 

The present invention also extends to forms suitable for topical application such as creams, 
15 lotions and gels. 

Pharmaceutically acceptable carriers and/or diluents include any and all solvents, dispersion 
media, coatings, antibacterial and antifungal agents, isotonic and absorption delaying agents and 
the like. The use of such media and agents for pharmaceutical active substances is well known 
20 in the art. Except insofar as any conventional media or agent is incompatible with the active 
ingredient, use thereof in the therapeutic compositions is contemplated. Supplementary active 
ingredients can also be incorporated into the compositions. 

It is especially advantageous to formulate parenteral compositions in dosage unit form for ease 
25 of administration and uniformity of dosage. Dosage unit form as used herein refers to physically 
discrete units suited as unitary dosages for the mammalian subjects to be treated; each unit 
containing a predetermined quantity of active material calculated to produce the desired 
therapeutic effect in association with the required pharmaceutical carrier. The specification for 
the novel dosage unit forms of the invention are dictated by and directly dependent on (a) the 
30 unique characteristics of the active material and the particular therapeutic effect to be achieved, 
and (b) the limitations inherent in the art of compounding such an active material for the 
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treatment of disease in living subjects having a diseased condition in which bodily health is 
impaired as herein disclosed in detail. 

The principal active ingredient is compounded for convenient and effective administration in 
5 effective amounts with a suitable pharmaceutically acceptable carrier in dosage unit form as 
hereinbefore disclosed. A unit dosage form can, for example, contain the principal active 
compound in amounts ranging from 0.5 \ig to about 2000 mg. Expressed in proportions, the 
active compound is generally present in from about 0.5 \xg to about 2000 mg/ml of carrier. In 
the case of compositions containing supplementary active ingredients, the dosages are 
10 determined by reference to the usual dose and manner of administration of the said ingredients. 

Effective amounts contemplated by the present invention include those amounts effective to 
ameliorate a condition. For example, it is envisaged that effective amounts would range from 
about 0.001 //g/kg body weight to about 100 mg/kg body weight. Alternatively, effective 
15 amounts of about 0.01 //g/kg body weight to about 10 mg/kg body weight or even 0. 1 jug/kg 
body weight to about 1 mg/kg body weight. Administration may be per minute, hour, day, week, 
month or year or may only be a once off administration. 

The pharmaceutical composition may also comprise genetic molecules such as a vector capable 
20 of transfecting target cells where the vector carries a nucleic acid molecule capable of modulating 
meg 18 expression or MCG18 activity. The vector may, for example, be a viral vector. 

As stated above, the present invention further contemplates a range of derivatives of MCG18. 

Derivatives include fragments, parts, portions, mutants, homologues and analogues of the 
25 MCG18 polypeptide and corresponding genetic sequence. Derivatives also include single or 

multiple amino acid substitutions, deletions and/or additions to MCG18 or single or multiple 

nucleotide substitutions, deletions and/or additions to the genetic sequence encoding MCG18. 

"Additions" to amino acid sequences or nucleotide sequences include fusions with other 

peptides, polypeptides or proteins or fusions to nucleotide sequences. Reference herein to 
30 "MCGI8" includes reference to all derivatives thereof including functional derivatives or MCG18 

immunologically interactive derivatives. 
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Analogues of MCG18 contemplated herein include, but are not limited to, modification to side 
chains, incorporating of unnatural amino acids and/or their derivatives during peptide, 
polypeptide or protein synthesis and the use of crosslinkers and other methods which impose 
conformational constraints on the proteinaceous molecule or their analogues. 

5 

Examples of side chain modifications contemplated by the present invention include 
modifications of amino groups such as by reductive alkylation by reaction with an aldehyde 
followed by reduction with NaBIfy; amidination with methylacetimidate; acylation with acetic 
anhydride; carbamoylation of amino groups with cyanate; trinitrobenzylation of amino groups 
10 with 2, 4, 6-trinitrobenzene sulphonic acid (TNBS); acylation of amino groups with succinic 
anhydride and tetrahydrophthalic anhydride; and pyridoxylation of lysine with pyridoxal-5- 
phosphate followed by reduction with NaBH^ 

The guanidine group of arginine residues may be modified by the formation of heterocyclic 
15 condensation products with reagents such as 2,3-butanedione, phenylglyoxal and glyoxal. 

The carboxyl group may be modified by carbodiimide activation via O-acylisourea formation 
followed by subsequent derivitisation, for example, to a corresponding amide. 

20 Sulphydryl groups may be modified by methods such as carboxymethylation with iodoacetic acid 
or iodoacetamide; performic acid oxidation to cysteic acid; formation of a mixed disulphides 
with other thiol compounds; reaction with maleimide, maleic anhydride or other substituted 
maleimide; formation of mercurial derivatives using 4-chloromercuribenzoate, 4- 
chloromercuriphenylsulphonic acid, phenylmercury chloride, 2-chloromercuri-4-nitrophenol and 

25 other mercurials; carbamoylation with cyanate at alkaline pH. 

Tryptophan residues may be modified by, for example, oxidation with N-bromosuccinimide or 
alkylation of the indole ring with 2-hydroxy-5-nitrobenzyl bromide qc sulphenyl halides. 
Tyrosine residues on the other hand, may be altered by nitration with tetranitromethane to form 
30 a 3-nitrotyrosine derivative. 
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Modification of the imidazole ring of a histidine residue may be accomplished by alkylation with 
iodoacetic acid derivatives or N-carbethoxylation with diethylpyrocarbonate. 

Examples of incorporating unnatural amino acids and derivatives during peptide synthesis 
5 include, but are not limited to, use of norleucine, 4-amino butyric acid, 4-amino-3-hydroxy-5- 
phenylpentanoic acid, 6-aminohexanoic acid, t-butylglycine, norvaline, phenylglycine, ornithine, 
sarcosine, 4-amino-3-hydroxy-6-methylheptanoic acid, 2-thienyl alanine and/or D-isomers of 
amino acids. A list of unnatural amino acids, contemplated herein is shown in Table 3. 
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TABLE 3 



Non-conventional 
amino acid 

5 


Code 


Non-conventional 
amino acid 


Code 


a-aminobutyric acid 


Abu 


L-N-methylalanine 


Nmala 


a-amino-a-methylbutyrate 


Mgabu 


L-N-methylarginine 


Nmarg 


aminocyclopropane- 


Cpro 


L-N-methylasparagine 


Nmasn 


carboxylate 




L-N-methylaspartic acid 


Nmasp 


10 aminoisobutyric acid 


Aib 


L-N-methylcysteine 


Nmcys 


aminonorbornyl- 


Norb 


L-N-methylglutamine 


Nmgln 


carboxylate 




L-N-methylglutamic acid 


Nmglu 


cyclohexylalanine 


Chexa 


L-N-methylhistidine 


Nmhis 


cyclopentylalanine 


Cpen 


L-N-methylisolleucine 


Nmile 


15 D-alanine 


Dal 


L-N-methylleucine 


Nmleu 


D-arginine 


Darg 


L-N-methyllysine 


Nmlys 


D-aspartic acid 


Dasp 


L-N-methylmethionine 


Nmmet 


D-cysteine 


Dcys 


L-N-methylnorleucine 


Nmnle 


D-glutamine 


Dgln 


L-N-methylnorvaline 


Nmnva 


20 D-glutamic acid 


Dglu 


L-N-methylornithine 


Nmorn 


D-histidine 


Dhis 


L-N-methylphenylalanine 


Nmphe 


D-isoleucine 


Dile 


L-N-methylproline 


Nmpro 


D-leucine 


Dleu 


L-N-methylserine 


Nmser 


D-lysine 


Dlys 


L-N-methylthreonine 


Nmthr 


25 D-methionine 


Dmet 


L-N-methyltryptophan 


Nmtrp 


D-omithine 


Dom 


L-N-methyltyrosine 


Nmtyr 


D-phenylalanine 


Dphe 


L-N-methylvaline 


Nmval 


D-proline 


Dpro 


L-N-methylethylglycine 


Nmetg 


D-serine 


Dser 


L-N-methyl-t-butylglycine 


Nntbug 


30 D-threonine 


Dthr 


L-norleucine 


Nle 


D-tryptophan 


Dtrp 


L-norvaline 


Nva 
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D-tyrosine 


Dtyr 


D-valine 


Dval 


D- a-methy lalanine 


Dmala 


D- a-methy larginine 


Dmarg 


5 D-a-methylasparagine 


Dmasn 


D-a-methylaspartate 


Dmasp 


D-a-methylcysteine 


Dmcys 


D-a-methylglutamine 


Dmgln 


D-a-methylhistidine 


Dmhis 


10 D-a-methylisoleucine 


Dmile 


D- a-methy lleucine 


Dmieu 


D-a-methyllysine 


Dmlys 


D-a-methylmethionine 


Dmmet 


D-a-methylornithine 


Dmom 


15 D-a-methylphenylalanine 


Dmphe 


D-a-methylproline 


Dmpro 


D-a-methylserine 


Dmser 


D-a-methylthreonine 


Dmthr 


D-a-methyltryptophan 


Dmtrp 


20 D-a-methyltyrosine 


Dmty 


D-a-methylvaline 


Dmval 


D-N-methylalanine 


Dnmala 


D-N-methylarginine 


Dnmarg 


D-N-methylasparagine 


Dnmasn 


25 D-N-methylaspartate 


DnmasD 


D-N-methylcysteine 


Dnmcys 


D-N-methylglutamine 


Dnmgin 


D-N-methylglutamate 


Dnmglu 


D-N-methylhistidine 


Dnmhis 


30 D-N-methylisoleucine 


Dnmile 


D-N-methylleucine 


Dnmleu 



a-methyl-aminoisobutyrate 


Maib 


a-methyl-Y-aminobutyrate 


Mgabu 


a-methylcyclohexylalanine 


Mchexa 


a-methylcylcopentylalanine 


Mcpen 


a-methyl-a-napthylalanine 


Manap 


a-methylpenicillamine 


Mpen 


N-(4-aminobutyl)glycine 


Nglu 


N-(2-aminoethyl)gIycine 


Naeg 


N-(3-aminopropyl)glycine 


Norn 


N-amino-a-methylbutyrate 


Nmaabu 


a-napthylalanine 


Anap 


N-benzylglycine 


Nphe 


N-(2-carbamylethyl)glycine 


Ngln 


N-(carbamylmethyl)gIycine 


Nasn 


N-(2-carboxyethyl)glycine 


Nglu 


N-(carboxymethyl)glycine 


Nasp 


N-cyclobutylglycine 


Ncbut 


N-cycloheptylglycine 


Nchep 


N-cyclohexylglycine 


Nchex 


N-cycIodecylglycine 


Ncdec 


N-cylcododecylglycine 


Ncdod 


N-cyclooctylglycine 


Ncoct 


N-cyclopropylglycine 


Ncpro 


N-cycloundecylglycine 


Ncund 


N-(2,2-diphenylethyl)glycine 


Nbhm 


N-(3,3-diphenylpropyl)glycine 


Nbhe 


N-(3-guanidinopropyl)glycine 


Narg 


N-( 1 -hydroxyethyl)glycine 


Nthr 


N-(hydroxyethyl))glycine 


Nser 


N-(imidazolylethyl))glycine 


Nhis 


N-(3 -indoly lyethy l)glycine 


Nhtrp 
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D-N-methyllysine 


Dnmlys 


N-methylcyclohexylalanine 


Nmchexa 


D-N-methylorni thine 


Dnmorn 


N-methylglycine 


Nala 


5 N-methylaminoisobutyrate 


Nmaib 


N-( 1 -methylpropyl)glycine 


Nile 


N-(2-methylpropyl)glycine 


Nleu 


D-N-methyltryptophan 


Dnmtrp 


D-N-methyltyrosine 


Dnmtyr 


10 D-N-methyl valine 


Dnmval 


y-aminobutyric acid 


Gabu 


L-r-butylglycine 


Tbug 


L-ethylglycine 


Etg 


L-homophenylalanine 


Hphe 


15 L-a-methylarginine 


Marg 


L- a-methy laspartate 


Masp 


L-oc-methylcysteine 


Mcys 


L-a-methylglutamine 


Mgln 


L-a-methylhistidine 


Mhis 


20 L-a-methylisoleucine 


Mile 


L-a-methylleucine 


Mleu 


L- a -methy lmethionine 


Mmet 


L-a-methylnorvaline 


Mnva 


L- a-methylpheny lalanine 


Mphe 


25 L-cc-methylserine 


Mser 


L-a-methyltryptophan 


Mtrp 



N-methy 1- y -aminobutyrate 


Nrpgabu 


D-N-methylmethionine 


Dnnrmet 


N-methylcyclopentylalanine 


Nmqpen 


D-N-methylphenylalanine 


Dnmphe 


D-N-methylproline 


Dnmpto 


D-N-methylserine 


Dnmser 


D-N-methylthreonine 


Dnmthr 


N-( 1 -methy lethyl)glycine 


Nval 


N-methyla-napthylalanine 


Nmanap 


N-methylpenicillamine 


Nmpen 


N-(p-hydroxyphenyl)glycine 


Nhtyr 


N-(thiomethyl)glycine 


Ncys 


penicillamine 


Pen 


L-a-methylalanine 


Mala 


L-a-methylasparagine 


Masn 


L-a-methyl-/-butylglycine 


Mtbug 


L-methylethylglycine 


Metg 


L-a-methylglutamate 


Mglu 


L-a-methylhomophenylalanine 


Mhphe 


N-(2-methylthioethyl)glycine 


Nmet 


L-a-methyllysine 


Mlys 


L-a-methylnorleucine 


Mnle 


L- a-methy lorni thine 


Morn 


L-a-methylproline 


Mpro 


L-a-methylthreonine 


Mthr 


L-a-methyltyrosine 


Mtyr 
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L-a-methylvaline Mval 
N-(N-(2,2-diphenylethyl) Nnbhm 
carbamylmethyl)glycine 
1 -carboxy- 1 -(2,2-diphenyI- Nmbc 
5 ethylamino)cyclopropane 



L-N-methylhomophenylalanine Nmhphe 
N-(N-(3,3-diphenylpropyl) Nnbhe 
carbamylmethyl)glycine 



Crosslinkers can be used, for example, to stabilise 3D conformations, using homo-bifunctional 
crosslinkers such as the bifiinctional imido esters having (CH2) n spacer groups with n=l to n=6, 

10 glutaraldehyde, N-hydroxysuccinimide esters and hetero-bifunctional reagents which usually 
contain an amino-reactive moiety such as N-hydroxysuccinimide and another group specific- 
reactive moiety such as maleimido or dithio moiety (SH) or carbodiimide (COOH). In addition, 
peptides can be conformationally constrained by, for example, incorporation of C a and ^ - 
methylamino acids, introduction of double bonds between C a and Cp atoms of amino acids and 

15 the formation of cyclic peptides or analogues by introducing covalent bonds such as forming an 
amide bond between the N and C termini, between two side chains or between a side chain and 
the N or C terminus. 

Such analogues also apply in respect of MCG4 and MCG7. 

20 

The present invention further contemplates chemical analogues of MCG18 capable of acting as 
antagonists or agonists of MCG18 or which can act as functional analogues of MCG18. 
Chemical analogues may not necessarily be derived from MCG18 but may share certain 
conformational similarities. Alternatively, chemical analogues may be specifically designed to 
25 mimic certain physiochemical properties of MCG18. Chemical analogues may be chemically 
synthesised or may be detected following, for example, natural product screening. 

The identification of MCG -.8 permits the generation of a range of therapeutic molecules capable 
of modulating expression of MCG18 or modulating the activity of MCG18. Modulators 
30 contemplated by the present invention includes agonists and antagonists of MCG18 expression. 
Antagonists of MCG18 expression include antisense molecules, ribozymes and co-suppression 
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molecules. Agonists include molecules which increase promoter ability or interfere with negative 
regulatory mechanisms. Agonists of MCG18 include molecules which overcome any negative 
regulatory mechanism. Antagonists of MCG18 include antibodies and inhibitor peptide 
fragments. 

5 

These types of modifications may be important to stabilise MCG18 if administered to an 
individual or for use as a diagnostic reagent. 

Other derivatives contemplated by the present invention include a range of glycosylation variants 
10 from a completely unglycosylated molecule to a modified glycosylated molecule. Altered 
glycosylation patterns may result from expression of recombinant molecules in different host 
cells. 

Another embodiment of the present invention contemplates a method for modulating expression 
15 of MCG18 in a human, said method comprising contacting the mcgJ8 gene encoding MCG18 
with an effective amount of a modulator of meg 18 expression for a time and under conditions 
sufficient to up-regulate or down-regulate or otherwise modulate expression of mcgl8. For 
example, a nucleic acid molecule encoding MCG18 or a derivative thereof may be introduced 
into a cell to facilitate protection of that cell from becoming cancerous. 

20 

Another aspect of the present invention contemplates a method of modulating activity of MCG18 
in a human, said method comprising administering to said mammal a modulating effective amount 
of a molecule for a time and under conditions sufficient to increase or decrease MCG18 activity. 
The molecule may be a proteinaceous molecule or a chemical entity and may also be a derivative 
25 of MCG18 or a chemical analogue or truncation mutant of MCG18. 



The present invention is further described with reference to the following non-limiting Examples. 



WO 98/53061 



PCT/AU98/00380 



-39- 

EXAMPLE 1 



A human gene (designated mcg4) was identified on chromosome 1 Iql 3 that on the basis of 
sequence homology is predicted to encode a putative transcription factor of 310 amino acids 
5 (Fig. 1). mcg4 is transcribed in several different cell lines (Fig. 7). 

EXAMPLE 2 

The expressed sequence tag (EST) database contains partial sequence data for the murine (Fig. 
10 2) and nematode (Fig. 3) homologues of mcg4. 

EXAMPLE 3 

MCG4 contains a sequence of cysteine residues within the N-terminal region of the protein that 
15 resembles zinc-finger binding domains of a novel type, ie. (HC 3 ) 2 [Fig. 4]. 

EXAMPLE 4 

Sensitive sequence homology searches reveal that related cysteine-containing motifs are present 
20 in another C. elegans protein (Fig. 5) as well as the GATA-binding transcription factor from S. 
pombe (Fig. 6). 

EXAMPLES 

25 mcg4 will have commercial value due to its likelihood of encoding a novel transcription factor 
that is highly conserved amongst organisms, thus suggesting an integral role in gene regulation. 
mcg4 may also be involved in some way in tissue-specific or temporal regulation of certain genes, 
thus making it a potential target for modulating expression of those downstream effectors. 



30 



WO 98/53061 



PCT/AU98/00380 



-40- 

EXAMPLE 6 

Nucleotide sequence data generated from cosmid clone cSRL-72c4 with the T7 primer 
(Promega, and Applied Biosystems Incorporated dye terminator sequencing kit) was aligned to 

5 the GenBank Expressed Sequence Tag (EST) database using the program BLASTN (Altschul 
et al 1990) and was found to match numerous human and mouse entries (Table 4 and Figure 2). 
These matching ESTs were further used to identify overlapping entries in the EST database 
(Table 5). The nucleotide sequences of these human ESTs were complied using Mac Vector 
4.2.1 software (IBI-Kodak) to produce the cDNA sequence shown in Figure 1. EST entries 

10 AA074703 and AA1 34788 are closely related at the nucleotide level to mcg4 and it is, therefore, 
likely that mcg4 is a member of a newly discovered gene family (Figure 8). 

The cDNA sequence of mcg4 was translated in all possible reading frames and compared to the 
GenBank non-redundant protein database using the program BLASTX (Altschul et al t 1990) at 

15 the National Center for Biotechnology Information (http//www.ncbi.nih.gov.nlm). As the 
protein appeared to be novel, a translation of the longest reading frame for the mcg4 cDN A was 
aligned to the EST database using the program TBLASTN, which performed a dynamic 
translation of the EST database in all 6 frames. The search results indicated that the nematode 
C elegans had an MCG4-like protein (Figure 3), with the matching domains containing a spatial 

20 sequence of Cysteine and Histidine residues which resembled a zinc-finger structure (Figure 4). 
The program BLASTP was used, therefore, to conduct sensitive searches of the protein 
databases for similar zinc-finger motifs. A weak match to the putative zinc-finger domain was 
observed for another protein from C. elegans (Figure 5) and a poorer match for the GATA- 
binding transcription factor from S. pombe (Figure 6). The putative initiation codon of human 

25 mcg4 is not preceded by an in-frame stop codon and it is therefore possible that the cDNA 
described in Figure 1 is a truncated form. However, sequence alignment of human and mouse 
mcg4 ESTs showed a lower degree of nucleotide conservation prior to the assigned initiation 
codon, thus supporting the notion that the region represents the 5' UTR (Figure 9). To 
determine the expression pattern of mcg4, 15jzg of the total cellular RNA (RNeasy Mini Kit, 

30 Qiagen) from various human cell lines grown in culture were electrophoresed through 1.2% w/v 
MOPS/formaldehyde gels and blotted onto nylon membranes (Amersham) by capillary transfer 
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using 20 x SSC (Sambrook et al, 1989). Filters were subsequently UV-fixed and hybridised 
overnight at 65°C to a radiolabeled ( 32 P-dCTP) cDNA probe (Church and Gilbert, 1984) for 
mcg4. After washes in 0.1 x SSC/0. 1% w/v SDS at 65°C for 1 hour, the filters were air-dried 
and exposed to X-ray film. This Northern analysis showed that mcg4 is expressed as a 1.6kb 
5 message in numerous tissues including breast, ovary, bladder, lung and keratinocytes (Figure 7). 

EXAMPLE 7 

A human gene (designated mcgT) was identified and isolated from chromosome 1 lql3 which 
10 encodes a protein that bears striking homology with guanine nucleotide exchange factors (GEFs) 
from a wide variety of organisms (Fig. 12). 

EXAMPLE 8 

15 The composite mcg7 cDNA sequence is at least 2.4kb in length and Figure 13(a) shows a 
predicted translation product of at least 609 amino acids beginning at methionine 120. An 
alternative start site due to alternate exon splicing (indicated in lower case) may yield a protein 
of 671 amino acids starting at methionine 58 (Fig. 13a). 

20 EXAMPLE 9 

An mcg7 homologue from C. elegans has been identified, the product of which is highly 
conserved with that of MCG7 (Fig. 14). There are several salient features of the protein which 
have been underlined in Fig. 14 - namely: a guanine nucleotide binding region, a diacylglycerol 
25 binding region, and "EF-hand"-calcium binding regions. In addition, there are several potential 
cAMP, protein kinase C, and casein kinase II phosphorylation sites, as well as a number of 
potential sites for glycosylation (not indicated). 

EXAMPLE 10 

30 

A number of partial human and murine EST clones exist for mcgl. The GenBank database 
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contains a cDNA (Acc. no. Y12336) encoding a full-length open reading frame (ORF) for human 
mcg7 as well as a partial murine mcg7 ORF (Y 12339). In addition, the complete genomic 
sequence of the human mcg7 gene is contained within GenBank entry AC000134. 

5 EXAMPLE 11 

The best characterised GEFs are members of the family of ras oncoproteins, which play a pivotal 
role in signal transduction and when mutated are responsible for tumour development. A variety 
of therapeutic regimes for cancer treatment have been designed to specifically interfere with the 
10 ras signalling pathways. There is potential, therefore that the product of mcg7 could also be a 
target for such clinical strategies. 

EXAMPLE 12 

15 The nucleotide sequence for mcg7 cDNA was extended 5' with genomic DNA sequence from 
Genbank accession number AC000134 (positions 1-321) and analysed for additional coding 
sequence 5* to the putative initiation codon (nt 681-683) (Fig. 16). An additional in-frame ATG 
occurs at position nt 495-497 when the alternatively splice exon (position nt 504-609) is present 
(also shown in Fig. 13(a)). This closely matches the Kozak consensus. When this exon is 

20 absent, then the ATG is not in-frame and other possible initiation codons are absent (resulting 
translation shown in lower case lettering) (also shown in Fig. 13(b)). Further evidence that the 
initiation codon at position nt 681-683 is the true initiation site is given in Figure 15. 

Alignment of human and a partial murine mcg7 cDNA sequences is shown in Figure 15. The 
25 putative initiation codon is at position nt 360-362. Both murine ESTs appear to have an 
upstream in-frame stop codon at position nt 326-328, downstream of the differentially spliced 
exon and the sequence alignment thus suggests that this region represents the 5' UTR of meg 7. 

'Furthermore, similarity with the C. elegans homologue strongly suggest that the ATG codon at 
30 position nt 360-362 encodes the N-terminus of MCG7. 
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EXAMPLE 13 

Figure 17 shows data from experiments indicating that a truncated version of MCG7 when 
expressed as a GST fusion protein (construct B in Fig. 18) can function as a Ras-guanine 
5 nucleotide exchange factor. In brief, Ras (unprocessed and as a GST fusion protein) is loaded 
with 3 H-GDP then incubated in the presence of excess cold GTP ± GST-MCG7. Full details of 
this assay can be found in Porfiri et al 

EXAMPLE 14 

10 

Nucleotide sequence data generated from cosmid clone cSRL-20hl2 with the T7 primer 
(Promega, and Applied Biosystems Incorporated dye terminator sequencing kit) were aligned 
to the GenBank Expressed Sequence Tag (EST) database using the program BLASTN (Altschul 
et al 1990) and was found to match GenBank entries T78563 (clone 1 13434) TO9103 (clone 
15 HIBBP12) and AA035643 (clone 471819). EST clones 1 13434 and 471819 were obtained from 
Genome Systems Inc. and these DNAs were sequenced on both strands with gene-specific 
primers (Table 5) to generate the cDNA sequence of mcgl shown in Figures 13(a) and (b). 

The cDNA sequence of mcgl was translated in all possible reading frames and compared to the 
20 GenBank non-redundant protein database using the program BLASTX (Altschul et al, 1990) and 
the coding region was assigned on the basis of showing homology to the C elegans protein 
F25B3.3 (Figure 14). The mcgl cDNA composite was suspected to contain a single nucleotide 
error that originated from clone 471819 and the correct nucleotide sequence was, therefore, 
sought by reverse transcription-polymerase chain reaction (RT-PCR) of the cDNA fragment 
25 from a human cDNA pool. Total RNA was extracted from a human lymphoblastoid cell line 
using an RNeasy Mini Kit (Qiagen). cDNA synthesis was conducted with the reverse 
transcriptase Superscript II RNaseH- (GIBCO, BRL) and random hexamers using the procedure 
recommended by the manufacturer (GIBCO, BRL). One fortieth of the cDNA mix was 
subjected to 35 cycles of PCR using the following cycling conditions: 94°C for 30 seconds, 58°C 
30 for 30 seconds and 72°C for 90 seconds. The 50//1 reaction mix consisted of lx reaction buffer 
(Dade Scientific), 2mM dNTP mix, 20pmol of primers (see Table 6) MCG7UF (within the 
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variably spliced exon of Figure 13(b), between nucleotide positions 184-201) and SGCADRV2 
(between nucleotide positions 866-846 of Figure 13(a)) and 10 units of Dynazyme (Dade 
Scientific). The resulting PCR product was cloned into the pGEM-T vector (Promega) using 
standard methodology and sequenced using gene-specific primers. The correct nucleotide 
5 sequence of mcgl (as shown in Figure 13(a)) matches that of the recently release GenBank entry 
Y12336. A partial mouse mcgl cDNA sequence can also be found in GenBank entry Y12339. 

EXAMPLE 15 

10 The coding sequence of mcgl was cloned into vectors for expression in both bacterial and 
mammalian cells. In addition to the full-length constructs, the deletion constructs shown in 
Figure 18 were designed to retain the guanine nucleotide exchange (GEF) domain. For 
prokaryotic expression, the mcgl coding region was inserted downstream of and in-frame with 
the Sj26 cassette of the pGEX (Pharmacia) series of vectors (Smith and Johnson, 1988) using 

15 standard cloning techniques (Sambrook et al, 1989). For mammalian expression, the mcgl 
coding sequence was first myc-tagged at the N-terminus and then ligated into the expression 
vector pc Exv-n using standard cloning techniques. Ligation junctions of the constructs were 
sequences as the cloning strategies inadvertently changed or introduced additional amino acids 
as shown below. 

20 

Construct (A): EST clone 1 13434 was digested with Apal (Figure 13(a), nucleotide positions 
1022 to >2416 (within the vector)), blunt-ended with T4 DNA polymerase according to the 
specifications of the manufacturer (New England Biolab) and ligated into the Smal site of pGEX- 
3X. 

25 

Sequence of the pGEX and mcgl (underlined) junction: 
pGEX-3X mcgl (1022) 

Sj26 ... GGG ATC CCC CTG GTC [SEQ ID NO: 19] 
r additional amino acids Gly He Pro 

30 

Construct (B): EST clone 113434 was digested with EcoRl (Figure 13(a), nucleotide 
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positions <695 (within the vector) to 1711) and ligated into the EcoKL site of pGEX-1. 

Sequence of the pGEX and mcgl (underlined) junction: 
pGEX-1 mcgl (695) 

5 Sj26 ... GAA TTC GGC ACG AG C CGA CGG [SEQ ID NO:20] 

additional amino acids Glu Phe Gly Thr Ser 

Construct (C): full-length mcgl: The pGEM-T clone containing the 5' end of the mcgl coding 
region was digested with Apal (subsequendy blunt-ended with T4 DNA polymerase) and BstXI 
10 to liberate the fragment between nucleotide positions 336 and 830 of Figure 13(a). Clone 
1 1 3434 was digested with BstXI and Hindlll (vector derived) to liberate a fragment between 
nucleotide positions 830 > and 2416 (vector derived) of Figure 13(a). A pGEM-1 lzf vector 
(Promega) containing the myc-tag was digested with Apal (subsequently blunt-ended with T4 
DNA polymerase) and HindUI, and ligated with the 2 inserts described above. 

15 

Sequence of the myc-teg/mcg 7 junction [SEQ ID NOs:21/22]: 

myc-tag vector BamHI meg 7 5' UTR (337) start 

ATGGAGCAGAAGCTGATCTCCGAGGAGGACCTG CCCGGGGCAGCTggatccG CAGCCCACCCCGCGCCGGCGGCCATG 
20MEQKLISEEDL PGAAGS AAHPAPAAM 

additional amino acids 

The myc-tagged full-length mcgl insert in pGEM-1 lzf was then excised with Sacl and Hindlll 
(both vector derived) and directionally cloned into the mammalian expression vector pEXV 
25 (Beranger a/ f 1994). 

Construct (D): Construct (C) in pGEM- 1 lzf was sequentially digested with Hindlll (this site 
was subsequently blunt-ended with T4 DNA polymerase) then BamHI, and ligated into pGEX- 
2T digested with BamHI and Smal. Digestion with BamHI, and ligated into pGEX-2T digested 
30 with BamHI and Smal. Digestion with BamHI removed the myc-tag of Construct (C). 

Sequence of the pGEX and mcgl [SEQ ID NO:23/24] (underlined) junction: 
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pGEX-2 
Sj26 



Ba/nHI mcgl (337) 

gga tec GCA GCC CAC CCC GCG CCG GCG GCC ATT, 
Gly Ser Ala Ala His Pro Ala Pro Ala Ala Met 



additional amino acids 



5 



EXAMPLE 16 



Overnight bacterial cultures containing the pGEX plasmid were used to inoculate 500ml of Luria 
Broth media containing 50^g/ml ampicillin. The cultures were grown to an OD of -0.8 and then 

10 induced with ImM of IPTG for up to 3 hours at 37°C. The bacteria were pelleted and 
resuspended in 15 ml of STE buffer (lOmM Tris pH 8,0, 150 mM NaCl and ImM EDTA) with 
1 mg/ml lysozyme. The mixture was left on ice for more than 1 hour and subsequent steps were 
performed at 4°C. Protease inhibitors aprotinin, pepstatin and leupeptin were added at final 
concentrations of 25//g/ml, prior to the addition of Triton-X-100 (2% v/v final) and n-lauroyl 

15 sarcosine (1.5% w/v final). The lysate was sonicated for -1 minute and pelleted at 14,000 x g 
for 15 minutes. 100 //l of 50% w/v glutathione-sephadex bead slurry (in PBS) was added per 
ml of supernatant. Following a 30 minute incubation at 4°C, the beads were washed three times 
with NETN (20mM Tris-HCl pH 8.0, lOOmM NaCl, ImM EDTA, 0.5% NP40), once with 
NETN-HS (equivalent to NETN but with 1M NaCl), and once in NETN. The bound protein 

20 was directly analysed by SDS-polyacrylamide gel electrophoresis (PAGE) as described below 
or the bound protein was eluted from the beads with the following elution buffer (50mM Tris pH 
8.0, 150mM NaCl 5mM MgCl 2 , ImM DTT, lOmM reduced glutathione) for use in GDP release 
assays. 



Twenty microlitres of GST-sepharose-bound MCG7 were added to an equal volume of 2 x 
30 sample loading dye (lOOmM Tris pH6.8, 2% v/v mercaptoethanol, 4% w/v SDS, 0.2% w/v 
bromophenol blue, 20% v/v glycerol), boiled for 5 min and loaded onto a 7.5% w/v SDS-PAGE 
gel (Sambrook et al, 1989). The Coomassie brilliant blue stained gel (Sambrook et al, 1989) 



EXAMPLE 17 
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typically displayed a protein doublet, running between 87-95 kDa consisting of the MCG7-GST 
fusion and a slightly smaller, co-purified contaminating E. coli protein of ~105kDa. The 
calculated molecular weight of full-length MCG7 is 77.5 kDa (Construct (D)) and the GST 
component has a molecular weight of 26kDa, hence, the recombinant protein runs slightly 
5 smaller than predicted. A Western blot of the same gel probed with anti-GST antibody yields 
an MCG7-specific band at the same position as that of the stained gel. 

EXAMPLE 18 

10 Assumptions: (a) GST-Ras molecular weight = 50 kD; (b) Concentration of GST-Ras solution 
= lmg/ml = 20//M; (c) [ 3 H]-GDP is lmCi/ml and 13.3Ci/mmol, therefore [ H]-GDP 
concentration = 75 ^M and lpmol [ 3 H]-GDP=15,466 cpm; (d) Elution buffer = Buffer E = 20 
mM Tris-Cl, pH7.5; 50mM NaCl; 5mM MgCl 2 ; ImM DTT (added just before use). Buffer E 
+ BSA= Buffer E+ lmg/ml BSA (added just before use). 

15 

Mix together, in the following order and mix well after each addition; 
10Ad (=10/zg) GST-Ras (@ lmg/ml in Buffer E), 463/zl Buffer E + BS A, 7^1 [ 3 H]-GDP, 10ml 
490 /zM EDTA. Incubate @ RT for 10 min. Add 10/il 0.5 M MgCl 2 and mix well. Incubate 
@ RT for 10 min. Place on ice. During the first incubation the excess EDTA concentration is 
20 5mM, during the second incubation the excess Mg concentration is 5mM. The [ 3 H]-GDP 
concentration is 1/zM and the final concentration of GST-Ras is 400nM. Thus 20ml of the final 
mix will contain 8pmol of GST-Ras protein. Specific activity of GDP is 15,446 cpm/pmol x 
(1/1.4)= 11,047 cpm/pmol. 

25 EXAMPLE 19 

Exchange Ras with labelled GDP as above. Add unlabelled GTP (stock = lOOmM, pH7) to 1 
mM. Adjust Mg concentration by adding 5jA 0.5 EDTA to labelled Ras, 5n\ 0.5M EDTA to 
500^1 MCG7, and 5/zl 0.5M EDTA to 500//1 Buffer E + BSA. On ice set up microfuge tubes 
30 with 40//1 Ras-GDP (in triplicate) with 40//1 MCG7 or Buffer E + BSA (control). Transfer tubes 
to heat block @ 25°C and incubate for 10, 20 or 30 min. Stop exchange reactions with 1ml of 
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ice cold buffer E and place on ice. Pre-soak nitrocellulose filters, pore size 45/zm, in Buffer E. 
Assemble the vacuum manifold apparatus (Millipore) with wet filters and plug the wells with 
rubber bunds. Switch on the vacuum pump. Remove the first plug, aliquot the sample and once 
it has been sucked through, wash the filter with 10ml of ice cold Buffer E. Remove next plug 
5 etc and continue round the manifold. Take manifold apart. Pin the filters to a pin board reserved 
for [ 3 H]. Air dry. Take up in 4ml scintillation fluid and count. These studies have been carried 
out with a truncated MCG7-GST fusion protein (amino acids 341 of Figure 13a to stop encoded 
within construct B). 

10 EXAMPLE 20 

A human gene was identified from chromosome 1 lql3 that encodes a new member of the DnaJ 
family of proteins (designated MCG18). This gene (meg 18) is expressed as an ~1.4kb mRNA 
(Fig. 28) and is predicted to encode a 241 amino acid product (Fig. 19). 

15 

EXAMPLE 21 

MCG18 has partial homology to E. coli dnaJ and other human DnaJ family members in that it 
contains the J domain (Fig. 20). 

20 

EXAMPLE 22 

MCG18 has greatest homology to functionally undefined proteins from C. elegans (Fig. 21) and 
S. pombe (Fig. 22) that also feature the J domain but maintain sequence similarity through the 
25 central and C-terminal regions of the proteins. 

EXAMPLE 23 

The J domain is proposed to mediate interaction with heat shock protein (Hsp70) 70 and consist 
30 of some 70 amino acids, frequently located at the N-terminus of the protein. One of these 
proteins, tumorous imaginal discs (Tid58) from Drosophila virilis (Fig. 23) functions as a 
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tumour suppressor. 

EXAMPLE 24 

5 A comparison of homology between MCG18 and human DnaJ proteins HDJ-2/H5DJ, HDJ- 
1/HSP40 and HSJ1 is shown in Fig. 24. 

EXAMPLE 25 

10 During the sequence characterisation of the VRF/VEGFB promoter region on cosmid CLGW4 
[Grimmond et al, 1996], which maps to chromosome 1 lq 13 the inventors identified a sequence 
that exactly matched numerous human and mouse expressed sequence tags (ESTs) in the EST 
database from a gene which we designated mcgl8. EST clones for human (GenBank accession 
number T69741, clone 108172; accession number H40901, clone 177008) and mouse mcgl8 

15 (accession number W34884, clone 350966; accession number W64183, clone 385535) were 
obtained from Genome Systems Inc. and sequenced with the gene-specific primers shown in 
Table 7. The EST clones listed in Table 8 were also utilised in generating the full-length coding 
sequence for human (Figure 19) and mouse (Figure 25) mcgl8. The EST database also 
contained meg 18 cDNA entries that were alternately (or partially) spliced, and in order to 

20 understand their ability to encode new polypeptides, the gene structure of mcgl8 was determined 
by sequencing human and mouse genomic templates with gene-specific primers. 

Genomic fragments containing the human [Grimmond et al, 1996] and murine genes [Townson 
et al 1996] have been previously reported. Cosmid CLGW4 contains the entire human gene 

25 and A 121 contains the entire mouse gene, as determined by direct sequencing of the templates 
with the oligonucleotides listed in Table 7. Plasmids containing sub-fragments of A 121 and 
cosmid CLGW4 were prepared using plasmid purification kits (Qiagen) and sequenced as 
described previously [Grimmond et al, 1996; Townson et al t 1996] using primers designed 
against cDNA and genomic sequences. The BLAST suite of programs [Altschul et al, 1990] 

30 was used to compare the sequence data against the nucleotide and protein databases at the 
National Center for Biotechnology Information (http//www.ncbi.nih.gov.nlm). The sequence 
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data were compiled using Mac Vector 4.2.1 software (IBI-Kodak). ClustalW sequence 
alignments [Thompson et al t 1994] were conducted using the Australian National Genome 
Information Service computer faculty at the University of Sydney, Australia. 

5 The cDNA sequence of human mcgl8 (Figure 19) was translated in all possible reading frames 
and compared to the GenBank non-redundant protein database using the program BLASTX 
[Altschul et al, 1990] and the coding region was identified on the basis of showing homology to 
the DnaJ family of proteins (Figure 20). The DnaJ domain is encoded within the longest open 
reading frame and the assigned initiation codon is preceded by an in-frame stop codon (Figure 

10 27). Similar database search results were obtained for the mouse mcgl8 cDNA, and the 
alignment of human and mouse protein sequences is shown in Figure 26. MCG18 has greatest 
homology to gene products from C. elegans (Figure 21) and S. pombe (Figure 22). Although 
it shares a similar J-domain, MCG18 does not contain other domains described for the tumour 
suppressor gene from D. virilis (Figure 23), nor is it a homologue of other reported human J- 

15 domain-containing proteins (Figure 24). 

To determine the expression pattern of meg 18. 15//g of total cellular RNA (RNeasy Mini Kit, 
Qiagen) from various human cell lines grown in culture were electrophoresed through 1.2% 
MOPS/formaldehyde gels and blotted onto nylon membranes (Amersham) by capillary transfer 
20 using 20 x SSC (Sambrook et al t 1986). Filters were subsequently UV-fixed and hybridised 
overnight at 65°C to a radiolabeled ( 32 P-dCTP) cDNA probe (Church and Gilbert, 1984) for 
mcgl8. After washes in 0.1 x SSC/0.1% w/v SDS for 65°C for 1 hour, the filters were air-dried 
and exposed to X-ray film. This Northern analysis showed that mcgl8 is expressed as a 1 ,4kb 
message in numerous tissues including breast, ovary, bladder, lung and keratinocytes (Figure 28). 
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TABLE 4 
ESTs matching mcg4 



accession 



number 



gb 

gb 
gb 
gb 
gb 
gb 
gb 
gb 
gb 
gb 
gb 
gb 
gb 
gb 
gb 

gb 

gb 
gb 
gb 

gb 
gb 
gb 



gb 
gb 

gb 
gb 
gb 
gb 
gb 

gb 
gb 

gb 
gb 

gb 



AA399110|AA399110 
N39612|N39612 
AA514406|AA514406 
AA544946|AA544946 
AA450076|AA450076 
AA535731|AA$35731 
W79710|W79710 
AA503531|AA503531 
AA4SO13 2|AA450132 
AA398068|AA398O68 
W60405|WS040S 
W813B2 (W81382 
AA047617|AA047617 
AA282175|AA282175 
AA2421S9|AA242159 
AA06868O |AA068680 
W46766|W46766 
N93704 {N93704 
AA15521O|AA155210 
AA366022|AA366O22 
AA037691|AA037691 
W35374|W35374 



dbj|COO696|CO0696 



T98249|T98249 
W21588 |W21588 
H32171|H32171 
AA108092|AA108092 
AA017857|AA017857 
AA037690|AA037690 
AA53 1006 |AA53 1006 
N46760|N46760 
W23584|W23S84 
W42214|W42214 
AA244877|AA244877 
W32939|W32939 



seq. run 


organism 


score 


E 


value 


N 


zt89e06 


si 


Soares testis NHT Homo sa . . 


1136 


4 


.0e- 


-168 


2 


yy51g06 


si 


Homo sapiens cDNA clone 2. . . 


1521 


5 


.3e- 


-168 


4 


n£57d01 


si 


NCI_CGAP_Co3 Homo sapiens . . . 


931 


5 


.5e- 


-166 


3 


vk38e02 


rl 


Soares mouse mammary glan. . . 


1207 


8 


4e- 


-164 


2 


zx42a04 


si 


Soares total fetus Nb2HF8. . . 


691 


2 


3e- 


-160 


4 


nf88f07 


si 


NCI__CGAP_Co3 Homo sapiens. . . 


796 


3 


5e- 


-158 


4 


zd86£01 


rl 


Soares fetal heart NbHHl9. . . 


1644 


1 


le- 


-157 


4 


nc47e08 


si 


NCI_CGAP_Co3 Homo sapiens., . 


736 


4 


Oe- 


-156 


4 


zx42a04 


rl 


Soares total fetus Nb2HF8... 


1955 


3 


9e- 


-155 


1 


Zt89f06 


rl 


Soares testis NHT Homo sa. . . 


1315 


5 


4e- 


-148 


2 


zd29hQ8 


rl 


Soares fetal heart NbHH19 . . . 


1022 


1 


8e- 


-139 


4 


zd86f01 


si 


Soares fetal heart NbHHl9 . . . 


605 


3 


5e- 


-125 


5 


zfl3f07 


si 


Soares fetal heart NbHH19... 


922 


4 


6e- 


-125 


2 


zt02d03 


si 


NCI_CGAP_GCBl Homo sapien. . . 


1577 


2 


Oe- 


■123 


1 


my30d04 


rl 


Barstead mouse pooled org. . . 


866 


7 


7e- 


-117 


2 


nm61a05 


rl 


Stratagene mouse embryoni . . . 


1280 


1 


6e- 


-98 


1 


zc36b07 


si 


Soares senescent fibrobla. . . 


506 


9, 


6e- 


•92 


3 


zb51c04 


si 


Soares fetal lung NbKL19W. . . 


584 


9. 


Oe- 


-91 


4 


mr98e01.rl Stratagene mouse embryoni. . . 


B40 


7. 


6e- 


-87 


2 


EST76915 Pineal gland II Homo sapien... 


1077 


2. 


4e- 


•81 


1 


zk34hl2 


si 


Soares pregnant uterus Nb. . . 


949 


2. 


le- 


-80 


2 


zc07h03 


si 


Soares parathyroid tumor . . . 


1016 


3. 


le-76 


1 


KUMGS00Q8251, Human Gene Signature, ... 


1009 


1. 


2e- 


-75 


1 


ye59a07.sl 


Homo sapiens cDNA clone 1 . . , 


998 


6.7e- 


-75 


1 


zb51c04 


rl 


Soares fetal lung NbKL*19W. . . 


484 


1. 


le- 


-69 


4 


EST10701S Rattus sp. cDNA 5' end- 


828 


1. 


le- 


-60 


1 


mm89e06.rl 


Stratagene mouse embryoni. . . 


782 


1. 


3e- 


-60 


2 


mh44d!0 


rl 


Soares mouse placenta 4Nb. . . 


665 


2. 


5e- 


-60 


2 


zk34hl2.rl 


Soares pregnant uterus Nb. . . 


540 


9. 


4e- 


■53 


2 


nj07bll.sl 


NCIjCGAP_Pr22 Homo sapien. . . 


535 


5. 


4e- 


-48 


2 


yySlgOG 


rl 


Homo sapiens cDNA clone 2 . . . 


665 


9. 


5e- 


-47 


1 


zc71d03 


si 


Soares fetal heart NbHHl9... 


457 


1. 


8e- 


-44 


2 


mc69h09 


rl 


Soares mouse embryo NbMEl . . . 


460 


1. 


3e- 


-38 


3 


mx25a04 


rl 


Soares mouse NML Mus muse . . . 


429 


2. 


9e- 


-25 


1 


zc07h03 


rl 


Soares parathyroid tumor . . . 


320 


4. 


8e- 


-18 


1 
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TABLE 5 

ESTs matching AA074703 (/wc^4-related cDNA) 



Database: Non-redundant Database of GenBank EST Division 



1,222,625 


sequences ; 


449.352,662 total letters . 




















Smallest 














Sum 










High 


Probability 


Sequences producing High- scoring 


Segment Pairs: 


Score 


P(Ni 


N 


accession number 


seq . run 


organism 


score 


E 


value 


N 


gb| AA074703 |AA074703 


zm76g07.rl 


Stratagene neuroepitheli . . . 


2071 


4 


.Oe-167 


1 


gb|AA068680|AA06B6B0 


mm61aOS . rl 


Stratagene mouse envbryon. . . 


1270 


4 


.4e-14S 


4 


gb|AAl3478B|AAX34788 


zm81g02.rl 


Stratagene neuroepitheli. . - 


946 


1 


.3e-144 


5 


gb|AA3991 10 |AA3 99110 


ztB9e06.sl 


Soares testis NHT Homo s... 


520 


8 


.7e-119 


6 


gb|N39612 |N39612 


yySlgOS.sl 


Homo sapiens cDNA clone . . . 


S82 


9 


.6e-110 


7 


gb|AA282175|AA282175 


2t02d03 .si 


NCI_CGAP_GCB1 Homo sapie. . . 


771 


9 


4e-80 


3 


gb|W81382 |W813B2 


zd86f01.sl 


Soares fetal heart NbHHl . . . 


329 


1 


6e-75 


6 


gb|AAS44946|AA544946 


vk38e02.rl 


Soares mouse mammary gla. . . 


644 


9 


6e-63 


2 


gb|W35374|W35374 


. zc07h03.sl Soares parathyroid tumor... 


294 


4 


5e-42 


4 


gb|W57106|W57106 


md57c!2.rl Soares mouse embryo NbME. . . 


394 


1 


9e-30 


2 


gb|AA244877|AA244877 


mx25a04 .rl 


Soares mouse NML Mus mus. . . 


162 


2 


le-27 


4 


gb|AA017857|AA017857 


mh44dl0.rl 


Soares mouse placenta 4N. . . 


230 


3 


.7e-23 


3 


gb|AA5310O6|AA531006 


nj07bll.sl 


NCI_CGAP_Pr22 Homo sapie... 


139 


2 


.3e-19 


3 


gb|H3217l|H32171 


EST107015 Rattus sp. cDNA 5* end. 


207 


2 


.6e-10 


2 


gb|W79710|W79710 


zd86f01.rl 


Soares fetal heart NbHHl . . . 


157 


0 


.0073 


1 
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TABLE 6 
mcg7-specific oligonucleotides 





sequence vj io j j 


PT7A Tr\ X T/~\ 

SEQ ID NOs. 




OOA LM AO 1 uiu 1 OA 1 OA ALL 


SEQ ID NO:25 


IVlV^VJ / -VJCF-ivD V L 


L1L AIL OIL Lu 1 Liu AlALlu 


SEQ ID NO:26 


JVI / iv 


r;T a r.AT rrvn hat r** a ptt /"* 

0 1 A OA 1 0 1 0 OA 1 LAO L 1 1 OO 


SEQ ED NO:27 


MCG7 CA FOR 


AGG TGG AGA ATG GTC AAGG 


SEQ ED NO:28 


MCG7-GEF-REV 


GTC ATA GTC TGT CTC CTA CT 


SEQ ID NO:29 


MCG7 GEF FOR 


ACA TAG ACA GCG TGC CTA CC 


SEQ ID NO:30 


MCG7-PKC-REV 


TAC AAC CTT AGG GAC ACC AG 


SEQEDNO:31 


MCG7-PKC-FOR 


TGC TGA GCC TGC TCA CGG TG 


SEQ ID NO:32 


T09103F 


CAA GTG AAC AGC ACG TCC 


SEQ ID NO:33 


M7F 


GAC TAT CTC AAG GAC CAG CTG 


SEQ ID NO:34 


MCG7UF 


GGT TCG GTC CGA GCC CGG 


SEQ ID NO:35 


SGCADRV2 


GGA GCG ATA CTC CAA GTA GGT 


SEQ ID NO:36 
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TABLE 7 

mcg/S-SPECIFIC OLIGONUCLEOTIDES 



name 


sequence 5 to 3 


HVESTF 


AGC GGG CCA GGC CCC TTC [SEQ ID NO:37] 


HV195F 


CAT CCT GGT CCA ATG CGC TC [SEQ ID NO:38] 


HV387F2 


GCA CTG AGG AAG TTA AAC GAG C [SEQ ID NO:39] 


HV408R 


GCT CGT TTA ACT TCC TCA GTG C [SEQ ID NO:40] 


EXONIREV 


GCT CAG CTC CAC AAA GCG GCT [SEQ ID NO:41] 


HVEST426F 


ACC AGC TCC GCT CAG GTA G [SEQ ID NO:42] 


HVEST623R 


TCC AGG AGC TGT GTG TTT GG [SEQ ID NO:43] 


SGVESTF3 


CCA GTT TCA CAG CGT GAG G [SEQ ID NO:44] 


HVEST631R 


CAG CAT GAG GAG GAG GCA G [SEQ ID NO:45] 
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TABLE 8 

EST CLONE SEQUENCES USED TO GENERATE HUMAN AND MOUSE 
mcg!8 cDNA SEQUENCE COMPOSITES 



EST clone number 


organism 


GenBank accession number 


lg2815 


human 


D45683 


0O1-T2-18 


human 


F17225 


273748 


human 


N37043 


177008 


human 


H40901 and H40939 


25801 1 


human 


N30776 


276887 • 


human 


N44004 


108172 


human 


T69741 


307529 


human 


W21083 and W32579 


342027 


human 


W60283 


354288 


mouse 


W44038 


350966 


mouse 


W348844 


426261 


mouse 


AA002868 


368185 


mouse 


W53911 


385535 


mouse 


W64183 


404472 


mouse 


W82959 


406437 


mouse 


W83482 
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SEQUENCE LISTING 



(1) GENERAL INFORMATION: 

(i) APPLICANT: (OTHER THAN US): The Council of The Queensland Institute of 

Medical Research 

(US ONLY): HAYWARD Nicholas, SELINS Ginters, GRIMMOND Sean, 
GARTSIDE Michael and HANCOCK, John 

(ii) TITLE OF INVENTIONS NOVEL GENE AND USES THEREFOR 

(iii) NUMBER OF SEQUENCES: 45 

(iv) CORRESPONDENCE ADDRESS: 

(A) ADDRESSEE: DA VIES COLLISON CAVE 

(B) STREET: 1 LITTLE COLLINS STREET 

(C) CITY: MELBOURNE 

(D) STATE: VICTORIA 

(E) COUNTRY: AUSTRALIA 
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(v) COMPUTER READABLE FORM: 

(A) MEDIUM TYPE: Floppy disk 
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(C) OPERATING SYSTEM: PC-DOS/MS-DOS 
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(vii) PRIOR APPLICATION DATA: 

(A) APPLICATION NUMBER: PP1459 

(B) FILING DATE: 22-JAN-1998 

(C) CLASSIFICATION: 

(vii) PRIOR APPLICATION DATA: 

(A) APPLICATION NUMBER: PP1460 

(B) FILING DATE: 22-JAN-1998 
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(2) INFORMATION FOR SEQ ID NO:l: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 8 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

<ii) MOLECULE TYPE: Peptide 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 1 : 

Cys Xaa Xaa Cys Xaa Gly Xaa Gly 
5 



(2) INFORMATION FOR SEQ ID NO: 2: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1242 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 



(ix) FEATURE: 

(A) NAME/KEY: CDS 

<B) LOCATION: 30.. 959 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:2: 

TCAGTAAACA CAGAGACTGG GGATCGATC ATG GGG CTT TGT AAG TGC CCC AAG 53 

Met Gly Leu Cys Lys Cys Pro Lys 
1 5 

AGA AAG GTG ACC AAC CTG TTC TGC TTC GAA CAT CGG GTC AAC GTC TGC 101 
Arg Lys Val Thr Asn Leu Phe Cys Phe Glu His Arg Val Asn Val Cys 
10 15 20 

GAG CAC TGC CTG GTA GCC AAT CAC GCC AAG TGC ATC GTC CAG TCC TAG 149 
Glu His Cys Leu Val Ala Asn His Ala Lys Cys He Val Gin Ser Tyr 
25 30 35 40 

CTG CAA TGG CTC CAA GAT AGC GAC TAC AAC CCC AAT TGC CGC CTG TGC 197 
Leu Gin Trp Leu Gin Asp Ser Asp Tyr Asn Pro Asn Cys Arg Leu Cys 
45 50 55 

AAC ATA CCC CTG GCC AGC CGA GAG ACG ACC CGC CTT GTC TGC TAT GAT 245 
Asn He Pro Leu Ala Ser Arg Glu Thr Thr Arg Leu Val Cys Tyr Asp 
60 65 70 

CTC TTT CAC TGG GCC TGC CTC AAT GAA CGT GCT GCC CAG CTA CCC CGA 293 
Leu Phe His Trp Ala Cys Leu Asn Glu Arg Ala Ala Gin Leu Pro Arg 
75 80 85 

AAC ACG GCA CCT GCC GGC TAT CAG TGC CCC AGC TGC AAT GGC CCC ATC 341 
Asn Thr Ala Pro Ala Gly Tyr Gin Cys Pro Ser Cys Asn Gly Pro He 
90 95 100 

TTC CCC CCA ACC AAC CTG GCT GGC CCC GTG GCC TCC GCA CTG AGA GAG 389 
Phe Pro Pro Thr Asn Leu Ala Gly Pro Val Ala Ser Ala Leu Arg Glu 
105 110 us 120 
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AAG 
Lys 


CTG 
Leu 


GCC 
Ala 


ACA 
Thr 


GTC 
Val 
125 


AAC 
Asn 


TGG 
Trp 


GCC 
Ala 


CGG 
Arg 


GCA 
Ala 
130 


GGA 
Gly 


CTG 
Leu 


GGC 
Gly 


CTC 
Leu 


CCT 
Pro 
135 


CTG 
Leu 


437 


ATC 
He 


GAT 
Asp 


GAG 
Glu 


GTG 
Val 
140 


GTG 
Val 


AGC 
Ser 


CCA 
Pro 


GAG 
Glu 


CCC 
Pro 
145 


GAG 
Glu 


CCC 
Pro 


CTC 
Leu 


AAC 
Asn 


ACG 
Thr 
150 


TCT 
Ser 


GAC 
Asp 


485 


TTC 
Phe 


TCT 
Ser 


GAC 
Asp 
155 


TGG 
Trp 


TCT 
Ser 


AGT 
Ser 


TTT 
Phe 


AAT 
Asn 
160 


GCC 
Ala 


AGC 
Ser 


AGT 
Ser 


ACC 
Thr 


CCT 
Pro 
165 


GGA 
Gly 


CCA 
Pro 


GAG 
Glu 


533 


GAG 
Glu 


GTA 
Val 
170 


GAC 
Asp 


AGC 
Ser 


GCC 
Ala 


TCT 
Ser 


GCT 
Ala 
175 


GCC 
Ala 


CCA 
Pro 


GCC 
Ala 


TTC 
Phe 


TAC 
Tyr 
180 


AGC 
Ser 


CGA 
Arg 


GCC 
Ala 


CCC 
Pro 


581 


CGG 
Arg 
185 


CCC 
Pro 


CCA 
Pro 


GCT 
Ala 


TCC 
Ser 


CCA 
Pro 
190 


GGC 
Gly 


CGG 
Arg 


CCC 
Pro 


GAG 
Glu 


CAG 
Gin 
195 


CAC 
His 


ACA 
Thr 


GTG 
Val 


ATC 
He 


CAC 
His 
200 


629 


ATG 
Met 


GGC 
Gly 


AAT 
Asn 


CCT 
Pro 


GAG 
Glu 
205 


CCC 
Pro 


TTG 
Leu 


ACT 
Thr 


CAC 
His 


GCC 
Ala 
210 


CCT 
Pro 


AGG 
Arg 


AAG 
Lys 


GTG 
Val 


TAT 
Tyr 
215 


GAT 
Asp 


677 


ACG 
Thr 


CGG 
Arg 


GAT 
Asp 


GAT 
Asp 
220 


GAC 
Asp 


CGG 
Arg 


ACA 
Thr 


CCA 
Pro 


GGC 
Gly 
225 


CTC 
Leu 


CAT 
His 


GGA 
Gly 


GAC 
Asp 


TGT 
Cys 
230 


GAC 
Asp 


GAT 
Asp 


725 


GAC 
Asp 


AAG 
Lys 


TAC 
Tyr 
235 


CGA 
Arg 


CGT 
Arg 


CGG 
Arg 


CCG 
Pro 


GCC 
Ala 
240 


TTG 
Leu 


GGT 
Gly 


TGG 
Trp 


CTG 
Leu 


GCC 
Ala 
245 


CGG 
Arg 


CTG 
Leu 


CTA 
Leu 


773 


AGG 
Arg 


AGC 
Ser 
250 


CGG 
Arg 


GCT 
Ala 


GGG 
Gly 


TCT 
Ser 


CGG 
Arg 
255 


AAG 
Lys 


CGG 
Arg 


CCG 
Pro 


CTG 
Leu 


ACC 
Thr 
260 


CTG 
Leu 


CTC 
Leu 


CAG 
Gin 


CGG 
Arg 


821 


GCG 
Ala 
265 


GGG 
Gly 


CTG 
Leu 


CTG 
Leu 


CTA 
Leu 


CTC 
Leu 
270 


TTG 
Leu 


GGA 
Gly 


CTG 
Leu 


CTG 
Leu 


GGC 
Gly 
275 


TTC 
Phe 


CTG 
Leu 


GCC 
Ala 


CTC 
Leu 


CTT 
Leu 
280 


869 


GCC 
Ala 


CTC 
Leu 


ATG 
Met 


TCT 
Ser 


CGC 
Arg 
285 


CTA GGC CGG 
Leu Gly Arg 


GCC 
Ala 


GCA 
Ala 
290 


GCT 
Ala 


GAC 
Asp 


AGC 
Ser 


GAT 
Asp 


CCC 
Pro 
295 


AAC 
Asn 


917 


CTG 
Leu 


GAC 
Asp 


CCA 
Pro 


CTC 
Leu 
300 


ATG 
Met 


AAC 
Asn 


CCT 
Pro 


CAC 
His 


ATC 
He 
305 


CGC 
Arg 


GTG 
Val 


GGC 
Gly 


CCC 
Pro 


TCC 
Ser 
310 


TGA 




962 


GCCCCCTTGC ' 


TTGTGGCTAG GCCAGCCTAG GATGTGGGTT 


CTGTGGAGGA GAGGCGGGGT 


1022 


AATGGGGAGG ( 


CTGAGGGCAC CTCTTCACTG CCCCTCTCCC 


TCAAGCCTAA GACACTAAGA 


1082 


CCCCAGACCC - 


AAAGCCAAGT CCACCAGAGT GGCTCGCAGG 


CCAGGCCTGG AGTCCCCGTG 


1142 


GGTCAAGCAT 1 


TTGTCTTGAC TTGCTTTCTC CCGGGTCTCC 


AGCCTCCGAC CCCTCGCCCC 


1202 


ATGAAGGAGC 1 


TGGCAGGTGG AAATAAACAA CAACTTTATT 












1242 



(2) INFORMATION FOR SEQ ID NO: 3: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 310 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 



(ii) MOLECULE TYPE: protein 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 3 : 

Met Gly Leu Cys Lys Cys Pro Lys Arg Lys Val Thr Asn Leu Phe Cys 
15 10 15 

Phe Glu His Arg Val Asn Val Cys Glu His Cys Leu Val Ala Asn His 
20 25 30 

Ala Lys Cys lie Val Gin Ser Tyr Leu Gin Trp Leu Gin Asp Ser Asp 
35 40 45 

Tyr Asn Pro Asn Cys Arg Leu Cys Asn lie Pro Leu Ala Ser Arg Glu 
50 55 60 

Thr Thr Arg Leu Val Cys Tyr Asp Leu Phe His Trp Ala Cys Leu Asn 
65 70 75 80 

Glu Arg Ala Ala Gin Leu Pro Arg Asn Thr Ala Pro Ala Gly Tyr Gin 
85 90 95 

Cys Pro Ser Cys Asn Gly Pro lie Phe Pro Pro Thr Asn Leu Ala Gly 
100 105 110 

Pro Val Ala Ser Ala Leu Arg Glu Lys Leu Ala Thr Val Asn Trp Ala 
115 120 125 

Arg Ala Gly Leu Gly Leu Pro Leu He Asp Glu Val Val Ser Pro Glu 
130 135 140 

Pro Glu Pro Leu Asn Thr Ser Asp Phe Ser Asp Trp Ser Ser Phe Asn 
145 150 155 160 

Ala Ser Ser Thr Pro Gly Pro Glu Glu Val Asp Ser Ala Ser Ala Ala 
165 170 175 

Pro Ala Phe Tyr Ser Arg Ala Pro Arg Pro Pro Ala Ser Pro Gly Arg 
180 185 190 

Pro Glu Gin His Thr Val He His Met Gly Asn Pro Glu Pro Leu Thr 
195 200 205 

His Ala Pro Arg Lys Val Tyr Asp Thr Arg Asp Asp Asp Arg Thr Pro 
210 215 220 

Gly Leu His Gly Asp Cys Asp Asp Asp Lys Tyr Arg Arg Arg Pro Ala 
225 230 235 240 

Leu Gly Trp Leu Ala Arg Leu Leu Arg Ser Arg Ala Gly Ser Arg Lys 
245 250 255 

Arg Pro Leu Thr Leu Leu Gin Arg Ala Gly Leu Leu Leu Leu Leu Gly 
260 265 270 

Leu Leu Gly Phe Leu Ala Leu Leu Ala Leu Met Ser Arg Leu Gly Arg 
275 280 285 

Ala Ala Ala Asp Ser Asp Pro Asn Leu Asp Pro Leu Met Asn Pro His 
290 295 300 

He Arg Val Gly Pro Ser 
305 310 



(2) INFORMATION FOR SEQ ID NO : 4 : 

(i) SEQUENCE CHARACTERISTICS: 
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( A) LENGTH: 2415 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 



(ix) FEATURE: 

(A) NAME /KEY : CDS 

(B) LOCATION: 3 . .2188 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:4: 

CG ATT TCA TTC CTC GCT CCC CAC AGG TCC CTC TCC CCA AAA TAT TCC 47 
lie Ser Phe Leu Ala Pro His Arg Ser Leu Ser Pro Lys Tyr Ser 
15 10 15 

CAT CTT GTC CTA GCC CAT CCC CCA GAC TAT CTC AAG GAC CAG CTG TCC 95 
His Leu Val Leu Ala His Pro Pro Asp Tyr Leu Lys Asp Gin Leu Ser 
20 25 30 

CCA CGC CCC CGA CCT CCA CTA GGC CTG TGC CAC CCG CTG CCT GCA GGA 143 
Pro Arg Pro Arg Pro Pro Leu Gly Leu Cys His Pro Leu Pro Ala Gly 
35 40 45 

AGA CGC CCG GTC CCG GGC CGG GTT AGC CCC ATG GGA ACG CAG CGC CTG 191 
Arg Arg Pro Val Pro Gly Arg Val Ser Pro Met Gly Thr Gin Arg Leu 
50 55 60 

TGT GGC CGC GGG ACT CAA GGC TGG CCT GGC TCA AGT GAA CAG CAC GTC 239 
Cys Gly Arg Gly Thr Gin Gly Trp Pro Gly Ser Ser Glu Gin His Val 
65 70 75 

CAG GAG GCG ACC TCG TCC GCG GGT TTG CAT TCT GGG GTG GAC GAG CTG 2 87 

Gin Glu Ala Thr Ser Ser Ala Gly Leu His Ser Gly Val Asp Glu Leu 
80 85 90 95 

GGG GTT CGG TCC GAG CCC GGT GGG AGG CTC CCG GAG CGC AGC CTG GGC 335 
Gly Val Arg Ser Glu Pro Gly Gly Arg Leu Pro Glu Arg Ser Leu Gly 
100 105 110 

CCA GCC CAC CCC GCG CCG GCG GCC ATG GCA GGC ACC CTG GAC CTG GAC 3 83 

Pro Ala His Pro Ala Pro Ala Ala Met Ala Gly Thr Leu Asp Leu Asp 
115 120 125 

AAG GGC TGC ACG GTG GAG GAG CTG CTC CGC GGG TGC ATC GAA GCC TTC 431 
Lys Gly Cys Thr Val Glu Glu Leu Leu Arg Gly Cys He Glu Ala Phe 
130 135 140 

GAT GAC TCC GGG AAG GTG CGG GAC CCG CAG CTG GTG CGC ATG TTC CTC 479 
Asp Asp Ser Gly Lys Val Arg Asp Pro Gin Leu Val Arg Met Phe Leu 
145 150 155 

ATG ATG CAC CCC TGG TAC ATC CCC TCC TCT CAG CTG GCG GCC AAG CTG 527 
Met Met His Pro Trp Tyr He Pro Ser Ser Gin Leu Ala Ala Lys Leu 
160 165 170 175 

CTC CAC ATC TAC CAA CAA TCC CGG AAG GAC AAC TCC AAT TCC CTG CAG 575 
T Leu His He Tyr Gin Gin Ser Arg Lys Asp Asn Ser Asn Ser Leu Gin 
180 185 190 

GTG AAA ACG TGC CAC CTG GTC AGG TAC TGG ATC TCC GCC TTC CCA GCG 623 
Val Lys Thr Cys His Leu Val Arg Tyr Trp He Ser Ala Phe Pro Ala 
195 200 205 
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GAG TTT GAC TTG AAC CCG GAG TTG GCT GAG CAG ATC AAG GAG CTG AAG 671 
Glu Phe Asp Leu Asn Pro Glu Leu Ala Glu Gin lie Lys Glu Leu Lys 
210 215 220 

GCT CTG CTA GAC CAA GAA GGG AAC CGA CGG CAC AGC AGC CTA ATC GAC 719 
Ala Leu Leu Asp Gin Glu Gly Asn Arg Arg His Ser Ser Leu He Asp 
225 230 235 

ATA GAC AGC GTC CCT ACC TAC AAG TGG AAG CGG CAG GTG ACT CAG CGG 767 
He Asp Ser Val Pro Thr Tyr Lys Trp Lys Arg Gin Val Thr Gin Arg 
240 245 250 255 

AAC CCT GTG GGA CAG AAA AAG CGC AAG ATG TCC CTG TTG TTT GAC CAC 815 
Asn Pro Val Gly Gin Lys Lys Arg Lys Met Ser Leu Leu Phe Asp His 
260 265 270 

CTG GAG CCC ATG GAG CTG GCG GAG CAT CTC ACC TAC TTG GAG TAT CGC 863 
Leu Glu Pro Met Glu Leu Ala Glu His Leu Thr Tyr Leu Glu Tyr Arg 
275 280 285 

TCC TTC TGC AAG ATC CTG TTT CAG GAC TAT CAC AGT TTC GTG ACT CAT 911 
Ser Phe Cys Lys He Leu Phe Gin Asp Tyr His Ser Phe Val Thr His 
290 295 300 

GGC TGC ACT GTG GAC AAC CCC GTC CTG GAG CGG TTC ATC TCC CTC TTC 959 
Gly Cys Thr Val Asp Asn Pro Val Leu Glu Arg Phe He Ser Leu Phe 
305 310 315 

AAC AGC GTC TCA CAG TGG GTG CAG CTC ATG ATC CTC AGC AAA CCC ACA 1007 
Asn Ser Val Ser Gin Trp Val Gin Leu Met He Leu Ser Lys Pro Thr 
320 325 330 335 

GCC CCG CAG CGG GCC CTG GTC ATC ACA CAC TTT GTC CAC GTG GCG GAG 1055 
Ala Pro Gin Arg Ala Leu Val He Thr His Phe Val His Val Ala Glu 
340 345 350 

AAG CTG CTA CAG CTG CAG AAC TTC AAC ACG CTG ATG GCA GTG GTC GGG 1103 
Lys Leu Leu Gin Leu Gin Asn Phe Asn Thr Leu Met Ala Val Val Gly 
355 360 365 

GGC CTG AGC CAC AGC TCC ATC TCC CGC CTC AAG GAG ACC CAC AGC CAC 1151 
Gly Leu Ser His Ser Ser He Ser Arg Leu Lys Glu Thr His Ser His 
370 375 380 

GTT AGC CCT GAG ACC ATC AAG CTC TGG GAG GGT CTC ACG GAA CTA GTG 1199 
Val Ser Pro Glu Thr He Lys Leu Trp Glu Gly Leu Thr Glu Leu Val 
385 390 395 

ACG GCG ACA GGC AAC TAT GGC AAC TAC CGG CGT CGG CTG GCA GCC TGT 1247 
Thr Ala Thr Gly Asn Tyr Gly Asn Tyr Arg Arg Arg Leu Ala Ala Cys 
400 405 410 415 

GTG GGC TTC CGC TTC CCG ATC CTG GGT GTG CAC CTC AAG GAC CTG GTG 1295 
Val Gly Phe Arg Phe Pro He Leu Gly Val His Leu Lys Asp Leu Val 
420 425 430 

GCC CTG CAG CTG GCA CTG CCT GAC TGG CTG GAC CCA GCC CGG ACC CGG 1343 
Ala Leu Gin Leu Ala Leu Pro Asp Trp Leu Asp Pro Ala Arg Thr Arg 
435 440 445 

CTC AAC GGG GCC AAG ATG AAG CAG CTC TTT AGC ATC CTG GAG GAG CTG 13 91 

Leu Asn Gly Ala Lys Met Lys Gin Leu Phe Ser He Leu Glu Glu Leu 
450 455 460 

GCC ATG GTG ACC AGC CTG CGG CCA CCA GTA CAG GCC AAC CCC GAC CTG 143 9 

Ala Met Val Thr Ser Leu Arg Pro Pro Val Gin Ala Asn Pro Asp Leu 
465 470 475 
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CTG AGC CTG CTC ACG GTG TCT CTG GAT CAG TAT CAG ACG GAG GAT GAG 1487 
Leu Ser Leu Leu Thr Val Ser Leu Asp Gin Tyr Gin Thr Glu Asp Glu 
480 485 490 495 

CTG TAC CAG CTG TCC CTG CAG CGG GAG CCG CGC TCC AAG TCC TCG CCA 153 5 

Leu Tyr Gin Leu Ser Leu Gin Arg Glu Pro Arg Ser Lys Ser Ser Pro 
500 505 510 

ACC AGC CCC ACG AGT TGC ACC CCA CCA CCC CGG CCC CCG GTA CTG GAG 1583 
Thr Ser Pro Thr Ser Cys Thr Pro Pro Pro Arg Pro Pro Val Leu Glu 
515 520 525 

GAG TGG ACC TCG GCT GCC AAA CCC AAG CTG GAT CAG GCC CTC GTG GTG 1631 
Glu Trp Thr Ser Ala Ala Lys Pro Lys Leu Asp Gin Ala Leu Val Val 
530 535 540 

GAG CAC ATC GAG AAG ATG GTG GAG TCT GTG TTC CGG AAC TTT GAC GTC 1679 
Glu His lie Glu Lys Met Val Glu Ser Val Phe Arg Asn Phe Asp Val 
545 550 555 

GAT GGG GAT GGC CAC ATC TCA CAG GAA GAA TTC CAG ATC ATC CGT GGG 17 27 

Asp Gly Asp Gly His lie Ser Gin Glu Glu Phe Gin lie lie Arg Gly 
560 565 570 575 

AAC TTC CCT TAC CTC AGC GCC TTT GGG GAC CTC GAC CAG AAC CAG GAT 1775 
Asn Phe Pro Tyr Leu Ser Ala Phe Gly Asp Leu Asp Gin Asn Gin Asp 
580 585 590 

GGC TGC ATC AGC AGG GAG GAG ATG GTT TCC TAT TTC CTG CGC TCC AGC 182 3 

Gly Cys lie Ser Arg Glu Glu Met Val Ser Tyr Phe Leu Arg Ser Ser 
595 600 605 

TCT GTG TTG GGG GGG CGC ATG GGC TTC GTA CAC AAC TTC CAG GAG AGC 1871 
Ser Val Leu Gly Gly Arg Met Gly Phe Val His Asn. Phe Gin Glu Ser 
610 615 620 

AAC TCC TTG CGC CCC GTC GCC TGC CGC CAC TGC AAA GCC CTG ATC CTG 1919 
Asn Ser Leu Arg Pro Val Ala Cys Arg His Cys Lys Ala Leu lie Leu 
625 630 635 

GGC ATC TAC AAG CAG GGC CTC AAA TGC CGA GCC TGT GGA GTG AAC TGC 1967 
Gly lie Tyr Lys Gin Gly Leu Lys Cys Arg Ala Cys Gly Val Asn Cys 
640 645 650 655 

CAC AAG CAG TGC AAG GAT CGC CTG TCA GTT GAG TGT CGG CGC AGG GCC 2 015 

His Lys Gin Cys Lys Asp Arg Leu Ser Val Glu Cys Arg Arg Arg Ala 
660 665 670 

CAG AGT GTG AGC CTG GAG GGG TCT GCA CCC TCA CCC TCA CCC ATG CAC 2063 
Gin Ser Val Ser Leu Glu Gly Ser Ala Pro Ser Pro Ser Pro Met His 
675 680 685 

AGC CAC CAT CAC CGC GCC TTC AGC TTC TCT CTG CCC CGC CCT GGC AGG 2111 
Ser His His His Arg Ala Phe Ser Phe Ser Leu Pro Arg Pro Gly Arg 
690 695 700 

CGA GGC TCC AGG CCT CCA GAG ATC CGT GAG GAG GAG GTA CAG ACG GTG 2159 
Arg Gly Ser Arg Pro Pro Glu lie Arg Glu Glu Glu Val Gin Thr Val 
705 710 715 

'GAG GAT GGG GTG TTT GAC ATC CAC TTG TA ATAGATGCTG TGGTTGGATC 2208 
Glu Asp Gly Val Phe Asp lie His Leu 
720 725 

AAGGACTCAT TCCTGCCTTG GAGAAAATAC TTCAACCAGA GCAGGGAGCC TGGGGGTGTC 2268 



GGGGCAGGAG GCTGGGGATG GGGGTGGGAT ATGAGGGTGG CATGCAGCTG AGGGCAGGGC 2 328 
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CAGGGCTGGT GTCCCTAAGG TTGTACAGAC TCTTGTGAAT ATTTGTATTT TCCAGATGGA 23 88 
ATAAAAAGGC CCGTGTAATT AACCTTC 2415 

(2) INFORMATION FOR SEQ ID NO : 5 : 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 728 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 5 : 

lie Ser Phe Leu Ala Pro His Arg Ser Leu Ser Pro Lys Tyr Ser His 
1 5 10 15 

Leu Val Leu Ala His Pro Pro Asp Tyr Leu Lys Asp Gin Leu Ser Pro 
20 25 30 

Arg Pro Arg Pro Pro Leu Gly Leu Cys His Pro Leu Pro Ala Gly Arg 
35 40 45 

Arg Pro Val Pro Gly Arg Val Ser Pro Met Gly Thr Gin Arg Leu Cys 
50 55 60 

Gly Arg Gly Thr Gin Gly Trp Pro Gly Ser Ser Glu Gin His Val Gin 
65 70 75 80 

Glu Ala Thr Ser Ser Ala Gly Leu His Ser Gly Val Asp Glu Leu Gly 
85 90 95 

Val Arg Ser Glu Pro Gly Gly Arg Leu Pro Glu Arg Ser Leu Gly Pro 
100 105 110 

Ala His Pro Ala Pro Ala Ala Met Ala Gly Thr Leu Asp Leu Asp Lys 
115 120 125 

Gly Cys Thr Val Glu Glu Leu Leu Arg Gly Cys lie Glu Ala Phe Asp 
130 135 140 

Asp Ser Gly Lys Val Arg Asp Pro Gin Leu Val Arg Met Phe Leu Met 
145 150 155 160 

Met His Pro Trp Tyr He Pro Ser Ser Gin Leu Ala Ala Lys Leu Leu 
165 170 175 

His He Tyr Gin Gin Ser Arg Lys Asp Asn Ser Asn Ser Leu Gin Val 
180 185 190 

Lys Thr Cys His Leu Val Arg Tyr Trp He Ser Ala Phe Pro Ala Glu 
195 200 205 

Phe Asp Leu Asn Pro Glu Leu Ala Glu Gin He Lys Glu Leu Lys Ala 
210 215 220 

Leu Leu Asp Gin Glu Gly Asn Arg Arg His Ser Ser Leu He Asp He 
225 230 235 240 

Asp Ser Val Pro Thr Tyr Lys Trp Lys Arg Gin Val Thr Gin Arg Asn 
245 250 255 

Pro Val Gly Gin Lys Lys Arg Lys Met Ser Leu Leu Phe Asp His Leu 
260 265 270 
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Glu Pro Met Glu Leu Ala Glu His Leu Thr Tyr Leu Glu Tyr Arg Ser 
275 280 285 

Phe Cys Lys lie Leu Phe Gin Asp Tyr His Ser Phe Val Thr His Gly 
290 295 300 

Cys Thr Val Asp Asn Pro Val Leu Glu Arg Phe lie Ser Leu Phe Asn 
305 310 315 320 

Ser Val Ser Gin Trp Val Gin Leu Met lie Leu Ser Lys Pro Thr Ala 
325 330 335 

Pro Gin Arg Ala Leu Val lie Thr His Phe Val His Val Ala Glu Lys 
340 345 350 

Leu Leu Gin Leu Gin Asn Phe Asn Thr Leu Met Ala Val Val Gly Gly 
355 360 365 

Leu Ser His Ser Ser lie Ser Arg Leu Lys Glu Thr His Ser His Val 
370 375 380 

Ser Pro Glu Thr He Lys Leu Trp Glu Gly Leu Thr Glu Leu Val Thr 
385 390 395 400 

Ala Thr Gly Asn Tyr Gly Asn Tyr Arg Arg Arg Leu Ala Ala Cys Val 
405 410 415 

Gly Phe Arg Phe Pro He Leu Gly Val His Leu Lys Asp Leu Val Ala 
420 425 430 

Leu Gin Leu Ala Leu Pro Asp Trp Leu Asp Pro Ala Arg Thr Arg Leu 
435 440 445 

Asn Gly Ala Lys Met Lys Gin Leu Phe Ser He Leu Glu Glu Leu Ala 
450 455 460 

Met Val Thr Ser Leu Arg Pro Pro Val Gin Ala Asn Pro Asp Leu Leu 
465 470 475 480 

Ser Leu Leu Thr Val Ser Leu Asp Gin Tyr Gin Thr Glu Asp Glu Leu 
485 490 495 

Tyr Gin Leu Ser Leu Gin Arg Glu Pro Arg Ser Lys Ser Ser Pro Thr 
500 505 510 

Ser Pro Thr Ser Cys Thr Pro Pro Pro Arg Pro Pro Val Leu Glu Glu 
515 520 525 

Trp Thr Ser Ala Ala Lys Pro Lys Leu Asp Gin Ala Leu Val Val Glu 
530 535 540 

His He Glu Lys Met Val Glu Ser Val Phe Arg Asn Phe Asp Val Asp 
545 550 555 560 

Gly Asp Gly His He Ser Gin Glu Glu Phe Gin He He Arg Gly Asn 
565 570 575 

Phe Pro Tyr Leu Ser Ala Phe Gly Asp Leu Asp Gin Asn Gin Asp Gly 
580 585 590 

Cys He Ser Arg Glu Glu Met Val Ser Tyr Phe Leu Arg Ser Ser Ser 
595 600 605 

Val Leu Gly Gly Arg Met Gly Phe Val His Asn Phe Gin Glu Ser Asn 
610 615 620 

Ser Leu Arg Pro Val Ala Cys Arg His Cys Lys Ala Leu He Leu Gly 
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625 630 635 640 

He Tyr Lys Gin Gly Leu Lys Cys Arg Ala Cys Gly Val Asn Cys His 
645 650 655 

Lys Gin Cys Lys Asp Arg Leu Ser Val Glu Cys Arg Arg Arg Ala Gin 
660 665 670 

Ser Val Ser Leu Glu Gly Ser Ala Pro Ser Pro Ser Pro Met His Ser 
675 680 685 

His His His Arg Ala Phe Ser Phe Ser Leu Pro Arg Pro Gly Arg Arg 
690 695 700 

Gly Ser Arg Pro Pro Glu He Arg Glu Glu Glu Val Gin Thr Val Glu 
705 710 715 720 

Asp Gly Val Phe Asp He His Leu 
725 

(2) INFORMATION FOR SEQ ID NO: 6: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2309 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 

(ix) FEATURE: 

(A) NAME /KEY: CDS 

(B) LOCATION: 254.. 2083 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 6: 

CGATTTCATT CCTCGCTCCC CACAGGTCCC TCTCCCCAAA ATATTCCCAT CTTGTCCTAG 60 

CCCATCCCCC AGACTATCTC AAGGACCAGC TGTCCCCACG CCCCCGACCT CCACTAGGCC 12 0 

TGTGCCACCC GCTGCCTGCA GGAAGACGCC CGGTCCCGGG CCGGGTTAGC CCCATGGGAA 180 

CGGGGTTCGG TCCGAGCCCG GTGGGAGGCT CCCGGAGCGC AGCCTGGGCC CAGCCCACCC 240 

CGCGCCGGCG GCC ATG GCA GGC ACC CTG GAC CTG GAC AAG GGC TGC ACG 289 
Met Ala Gly Thr Leu Asp Leu Asp Lys Gly Cys Thr 
15 10 

GTG GAG GAG CTG CTC CGC GGG TGC ATC GAA GCC TTC GAT GAC TCC GGG 337 
Val Glu Glu Leu Leu Arg Gly Cys He Glu Ala Phe Asp Asp Ser Glv 
15 20 25 

AAG GTG CGG GAC CCG CAG CTG GTG CGC ATG TTC CTC ATG ATG CAC CCC 3 85 

Lys Val Arg Asp Pro Gin Leu Val Arg Met Phe Leu Met Met His Pro 
30 35 40 

TGG TAC ATC CCC TCC TCT CAG CTG GCG GCC AAG CTG CTC CAC ATC TAC 433 
Trp Tyr He Pro Ser Ser Gin Leu Ala Ala Lys Leu Leu His He Tyr 
45 50 55 60 

CAA CAA TCC CGG AAG GAC AAC TCC AAT TCC CTG CAG GTG AAA ACG TGC 4 81 

Gin Gin Ser Arg Lys Asp Asn Ser Asn Ser Leu Gin Val Lys Thr Cys 
65 70 75 

CAC CTG GTC AGG TAC TGG ATC TCC GCC TTC CCA GCG GAG TTT GAC TTG 529 
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His Leu Val Arg Tyr Trp lie Ser Ala Phe Pro Ala Glu Phe Asp Leu 
80 85 90 

AAC CCG GAG TTG GCT GAG CAG ATC AAG GAG CTG AAG GCT CTG CTA GAC 577 
Asn Pro Glu Leu Ala Glu Gin lie Lys Glu Leu Lys Ala Leu Leu Asp 
95 100 105 

CAA GAA GGG AAC CGA CGG CAC AGC AGC CTA ATC GAC ATA GAC AGC GTC 625 
Gin Glu Gly Asn Arg Arg His Ser Ser Leu lie Asp lie Asp Ser Val 
110 115 120 

CCT ACC TAC AAG TGG AAG CGG CAG GTG ACT CAG CGG AAC CCT GTG GGA 673 
Pro Thr Tyr Lys Trp Lys Arg Gin Val Thr Gin Arg Asn Pro Val Gly 
125 130 135 140 

CAG AAA AAG CGC AAG ATG TCC CTG TTG TTT GAC CAC CTG GAG CCC ATG 721 
Gin Lys Lys Arg Lys Met Ser Leu Leu Phe Asp His Leu Glu Pro Met 
145 150 155 

GAG CTG GCG GAG CAT CTC ACC TAC TTG GAG TAT CGC TCC TTC TGC AAG 7 69 

Glu Leu Ala Glu His Leu Thr Tyr Leu Glu Tyr Arg Ser Phe Cys Lys 
160 165 170 

ATC CTG TTT CAG GAC TAT CAC AGT TTC GTG ACT CAT GGC TGC ACT GTG 817 
lie Leu Phe Gin Asp Tyr His Ser Phe Val Thr His Gly Cys Thr Val 
175 180 185 

GAC AAC CCC GTC CTG GAG CGG TTC ATC TCC CTC TTC AAC AGC GTC TCA 865 
Asp Asn Pro Val Leu Glu Arg Phe lie Ser Leu Phe Asn Ser Val Ser 
190 195 200 

CAG TGG GTG CAG CTC ATG ATC CTC AGC AAA CCC ACA GCC CCG CAG CGG 913 
Gin Trp Val Gin Leu Met lie Leu Ser Lys Pro Thr Ala Pro Gin Arg 
205 210 215 220 

GCC CTG GTC ATC ACA CAC TTT GTC CAC GTG GCG GAG AAG CTG CTA CAG 961 
Ala Leu Val lie Thr His Phe Val His Val Ala Glu Lys Leu Leu Gin 
225 230 235 

CTG CAG AAC TTC AAC ACG CTG ATG GCA GTG GTC GGG GGC CTG AGC CAC 1009 
Leu Gin Asn Phe Asn Thr Leu Met Ala Val Val Gly Gly Leu Ser His 
240 245 250 

AGC TCC ATC TCC CGC CTC AAG GAG ACC CAC AGC CAC GTT AGC CCT GAG 1057 
Ser Ser lie Ser Arg Leu Lys Glu Thr His Ser His Val Ser Pro Glu 
255 260 265 

ACC ATC AAG CTC TGG GAG GGT CTC ACG GAA CTA GTG ACG GCG ACA GGC 1105 
Thr He Lys Leu Trp Glu Gly Leu Thr Glu Leu Val Thr Ala Thr Gly 
270 275 280 

AAC TAT GGC AAC TAC CGG CGT CGG CTG GCA GCC TGT GTG GGC TTC CGC 1153 
Asn Tyr Gly Asn Tyr Arg Arg Arg Leu Ala Ala Cys Val Gly Phe Arg 
285 290 295 300 

TTC CCG ATC CTG GGT GTG CAC CTC AAG GAC CTG GTG GCC CTG CAG CTG 12 01 

Phe Pro He Leu Gly Val His Leu Lys Asp Leu Val Ala Leu Gin Leu 
305 310 315 

GCA CTG CCT GAC TGG CTG GAC CCA GCC CGG ACC CGG CTC AAC GGG GCC 1249 
T Ala Leu Pro Asp Trp Leu Asp Pro Ala Arg Thr Arg Leu Asn Gly Ala 
320 325 330 

AAG ATG AAG CAG CTC TTT AGC ATC CTG GAG GAG CTG GCC ATG GTG ACC 1297 
Lys Met Lys Gin Leu Phe Ser He Leu Glu Glu Leu Ala Met Val Thr 
335 340 345 
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AGC CTG CGG CCA CCA GTA CAG GCC AAC CCC GAC CTG CTG AGC CTG CTC 1345 
Ser Leu Arg Pro Pro Val Gin Ala Asn Pro Asp Leu Leu Ser Leu Leu 
350 355 360 

ACG GTG TCT CTG GAT CAG TAT CAG ACG GAG GAT GAG CTG TAC CAG CTG 1393 
Thr Val Ser Leu Asp Gin Tyr Gin Thr Glu Asp Glu Leu Tyr Gin Leu 
365 370 375 380 

TCC CTG CAG CGG GAG CCG CGC TCC AAG TCC TCG CCA ACC AGC CCC ACG 1441 
Ser Leu Gin Arg Glu Pro Arg Ser Lys Ser Ser Pro Thr Ser Pro Thr 
385 390 395 

AGT TGC ACC CCA CCA CCC CGG CCC CCG GTA CTG GAG GAG TGG ACC TCG 1489 
Ser Cys Thr Pro Pro Pro Arg Pro Pro Val Leu Glu Glu Trp Thr Ser 
400 405 410 

GCT GCC AAA CCC AAG CTG GAT CAG GCC CTC GTG GTG GAG CAC ATC GAG 1537 
Ala Ala Lys Pro Lys Leu Asp Gin Ala Leu Val Val Glu His lie Glu 
415 420 425 

AAG ATG GTG GAG TCT GTG TTC CGG AAC TTT GAC GTC GAT GGG GAT GGC 1585 
Lys Met Val Glu Ser Val Phe Arg Asn Phe Asp Val Asp Gly Asp Gly 
430 435 440 

CAC ATC TCA CAG GAA GAA TTC CAG ATC ATC CGT GGG AAC TTC CCT TAC 1633 
His lie Ser Gin Glu Glu Phe Gin lie lie Arg Gly Asn Phe Pro Tyr 
445 450 455 460 

CTC AGC GCC TTT GGG GAC CTC GAC CAG AAC CAG GAT GGC TGC ATC AGC 1681 
Leu Ser Ala Phe Gly Asp Leu Asp Gin Asn Gin Asp Gly Cys lie Ser 
465 470 475 

AGG GAG GAG ATG GTT TCC TAT TTC CTG CGC TCC AGC TCT GTG TTG GGG 172 9 

Arg Glu Glu Met Val Ser Tyr Phe Leu Arg Ser Ser Ser Val Leu Gly 
480 485 490 

GGG CGC ATG GGC TTC GTA CAC AAC TTC CAG GAG AGC AAC TCC TTG CGC 1777 
Gly Arg Met Gly Phe Val His Asn Phe Gin Glu Ser Asn Ser Leu Arg 
495 500 505 

CCC GTC GCC TGC CGC CAC TGC AAA GCC CTG ATC CTG GGC ATC TAC AAG 1825 
Pro Val Ala Cys Arg His Cys Lys Ala Leu He Leu Gly He Tyr Lys 
510 515 520 

CAG GGC CTC AAA TGC CGA GCC TGT GGA GTG AAC TGC CAC AAG CAG TGC 1873 
Gin Gly Leu Lys Cys Arg Ala Cys Gly Val Asn Cys His Lys Gin Cys 
525 530 535 540 

AAG GAT CGC CTG TCA GTT GAG TGT CGG CGC AGG GCC CAG AGT GTG AGC 1921 
Lys Asp Arg Leu Ser Val Glu Cys Arg Arg Arg Ala Gin Ser Val Ser 
545 550 555 

CTG GAG GGG TCT GCA CCC TCA CCC TCA CCC ATG CAC AGC CAC CAT CAC 1969 
Leu Glu Gly Ser Ala Pro Ser Pro Ser Pro Met His Ser His His His 
560 565 570 

CGC GCC TTC AGC TTC TCT CTG CCC CGC CCT GGC AGG CGA GGC TCC AGG 2017 
Arg Ala Phe Ser Phe Ser Leu Pro Arg Pro Gly Arg Arg Gly Ser Arg 
575 580 585 

CCT CCA GAG ATC CGT GAG GAG GAG GTA CAG ACG GTG GAG GAT GGG GTG 2065 

Pro Pro Glu He Arg Glu Glu Glu Val Gin Thr Val Glu Asp Gly Val 
590 595 600 

TTT GAC ATC CAC TTG TAATAGATGC TGTGGTTGGA TCAAGGACTC ATTCCTGCCT 2120 
Phe Asp He His Leu 
605 610 
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TGGAGAAAAT ACTTCAACCA GAGCAGGGAG CCTGGGGGTG TCGGGGCAGG AGGCTGGGGA 2180 

TGGGGGTGGG ATATGAGGGT GGCATGCAGC TGAGGGCAGG GCCAGGGCTG GTGTCCCTAA 2240 

GGTTGTACAG ACTCTTGTGA ATATTTGTAT TTTCCAGATG GAATAAAAAG GCCCGTGTAA 23 00 

TTAACCTTC 2309 

(2) INFORMATION FOR SEQ ID NO : 7 : 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 609 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 

<xi) SEQUENCE DESCRIPTION: SEQ ID NO : 7 : 

Met Ala Gly Thr Leu Asp Leu Asp Lys Gly Cys Thr Val Glu Glu Leu 
15 10 15 

Leu Arg Gly Cys lie Glu Ala Phe Asp Asp Ser Gly Lys Val Arg Asp 
20 25 30 

Pro Gin Leu Val Arg Met Phe Leu Met Met His Pro Trp Tyr lie Pro 
35 40 45 

Ser Ser Gin Leu Ala Ala Lys Leu Leu His lie Tyr Gin Gin Ser Arg 
50 55 60 

Lys Asp Asn Ser Asn Ser Leu Gin Val Lys Thr Cys His Leu Val Arg 
65 70 75 80 

Tyr Trp lie Ser Ala Phe Pro Ala Glu Phe Asp Leu Asn Pro Glu Leu 
85 90 95 

Ala Glu Gin lie Lys Glu Leu Lys Ala Leu Leu Asp Gin Glu Gly Asn 
100 105 110 

Arg Arg His Ser Ser Leu lie Asp lie Asp Ser Val Pro Thr Tyr Lys 
115 120 125 

Trp Lys Arg Gin Val Thr Gin Arg Asn Pro Val Gly Gin Lys Lys Arg 
130 135 140 

Lys Met Ser Leu Leu Phe Asp His Leu Glu Pro Met Glu Leu Ala Glu 
145 150 155 160 

His Leu Thr Tyr Leu Glu Tyr Arg Ser Phe Cys Lys lie Leu Phe Gin 
165 170 175 

Asp Tyr His Ser Phe Val Thr His Gly Cys Thr Val Asp Asn Pro Val 
180 185 190 

Leu Glu Arg Phe lie Ser Leu Phe Asn Ser Val Ser Gin Trp Val Gin 
195 200 205 

Leu Met lie Leu Ser Lys Pro Thr Ala Pro Gin Arg Ala Leu Val lie 
210 215 220 

Thr His Phe Val His Val Ala Glu Lys Leu Leu Gin Leu Gin Asn Phe 
225 230 235 240 

Asn Thr Leu Met Ala Val Val Gly Gly Leu Ser His Ser Ser lie Ser 
245 250 255 
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Arg Leu Lys Glu Thr His Ser His Val Ser Pro Glu Thr lie Lys Leu 
260 265 270 

Trp Glu Gly Leu Thr Glu Leu Val Thr Ala Thr Gly Asn Tyr Gly Asn 
275 280 285 

Tyr Arg Arg Arg Leu Ala Ala Cys Val Gly Phe Arg Phe Pro lie Leu 
290 295 300 

Gly Val His Leu Lys Asp Leu Val Ala Leu Gin Leu Ala Leu Pro Asp 
305 310 315 320 

Trp Leu Asp Pro Ala Arg Thr Arg Leu Asn Gly Ala Lys Met Lys Gin 
325 330 335 

Leu Phe Ser lie Leu Glu Glu Leu Ala Met Val Thr Ser Leu Arg Pro 
340 345 350 

Pro Val Gin Ala Asn Pro Asp Leu Leu Ser Leu Leu Thr Val Ser Leu 
355 360 365 

Asp Gin Tyr Gin Thr Glu Asp Glu Leu Tyr Gin Leu Ser Leu Gin Arg 
370 375 380 

Glu Pro Arg Ser Lys Ser Ser Pro Thr Ser Pro Thr Ser Cys Thr Pro 
385 390 395 400 

Pro Pro Arg Pro Pro Val Leu Glu Glu Trp Thr Ser Ala Ala Lys Pro 
405 410 415 

Lys Leu Asp Gin Ala Leu Val Val Glu His lie Glu Lys Met Val Glu 
420 425 430 

Ser Val Phe Arg Asn Phe Asp Val Asp Gly Asp Gly His lie Ser Gin 
435 440 445 

Glu Glu Phe Gin lie lie Arg Gly Asn Phe Pro Tyr Leu Ser Ala Phe 
450 455 460 

Gly Asp Leu Asp Gin Asn Gin Asp Gly Cys He Ser Arg Glu Glu Met 
465 470 475 480 

Val Ser Tyr Phe Leu Arg Ser Ser Ser Val Leu Gly Gly Arg Met Gly 
485 490 495 

Phe Val His Asn Phe Gin Glu Ser Asn Ser Leu Arg Pro Val Ala Cys 
500 505 510 

Arg His Cys Lys Ala Leu He Leu Gly He Tyr Lys Gin Gly Leu Lys 
515 520 525 

Cys Arg Ala Cys Gly Val Asn Cys His Lys Gin Cys Lys Asp Arg Leu 
530 535 540 

Ser Val Glu Cys Arg Arg Arg Ala Gin Ser Val Ser Leu Glu Gly Ser 
545 550 555 560 

Ala Pro Ser Pro Ser Pro Met His Ser His His His Arg Ala Phe Ser 
565 570 575 

Phe Ser Leu Pro Arg Pro Gly Arg Arg Gly Ser Arg Pro Pro Glu He 
530 585 590 

Arg Glu Glu Glu Val Gin Thr Val Glu Asp Gly Val Phe Asp He His 
595 600 605 

Leu 
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(2) INFORMATION FOR SEQ ID NO: 8: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 832 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 

(ix) FEATURE: 

(A) NAME/KEY: CDS 

(B) LOCATION: 11. .733 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 8: 

GCCCGCCGCC ATG CCG CCC TTA CTG CCC CTG CGC CTG TGC CGG CTG TGG 49 
Met Pro Pro Leu Leu Pro Leu Arg Leu Cys Arg Leu Trp 
15 10 

CCC CGC AAC CCT CCC TCC CGG CTC CTC GGA GCG GCC GCC GGG CAG CGG 97 
Pro Arg Asn Pro Pro Ser Arg Leu Leu Gly Ala Ala Ala Gly Gin Arcx 
15 20 25 

TCC AGA CCC AGT ACT TAT TAT GAA CTG TTG GGG GTG CAT CCT GGT GCC 145 
Ser Arg Pro Ser Thr Tyr Tyr Glu Leu Leu Gly Val His Pro Gly Ala 
30 35 40 45 

AGC ACT GAG GAA GTT AAA CGA GCT TTC TTC TCC AAG TCC AAA GAG CTG 193 
Ser Thr Glu Glu Val Lys Arg Ala Phe Phe Ser Lys Ser Lys Glu Leu 
50 55 60 

CAC CCA GAC CGG GAC CCT GGG AAC CCA AGC CTG CAC AGC CGC TTT GTG 241 
His Pro Asp Arg Asp Pro Gly Asn Pro Ser Leu His Ser Arg Phe Val 
65 70 75 

GAG CTG AGC GAG GCA TAC CGT GTG CTC AGC CGT GAG CAG AGC CGC CGC 2 89 

Glu Leu Ser Glu Ala Tyr Arg Val Leu Ser Arg Glu Gin Ser Arg Arg 
80 85 90 

AGC TAT GAT GAC CAG CTC CGC TCA GGT AGT CCC CCA AAG TCT CCA CGA 337 
Ser Tyr Asp Asp Gin Leu Arg Ser Gly Ser Pro Pro Lys Ser Pro Arg 
95 100 105 

ACC ACA GTC CAT GAC AAG TCT GCC CAC CAA ACA CAC AGC TCC TGG ACA 3 85 

Thr Thr Val His Asp Lys Ser Ala His Gin Thr His Ser Ser Trp Thr 
HO 115 120 125 

CCC CCC AAC GCA CAG TAC TGG TCC CAG TTT CAC AGC GTG AGG CCA CAG 433 
Pro Pro Asn Ala Gin Tyr Trp Ser Gin Phe His Ser Val Arg Pro Gin 
130 135 140 

GGG CCC CAG TTG AGG CAG CAG CAA CAC AAA CAA AAC AAA CAA GTG CTG 481 
Gly Pro Gin Leu Arg Gin Gin Gin His Lys Gin Asn Lys Gin Val Leu 
145 150 155 

GGG TAC TGC CTC CTC CTC ATG CTG GCG GGC ATG GGC CTG CAC TAC ATT 529 
Gly Tyr Cys Leu Leu Leu Met Leu Ala Gly Met Gly Leu His Tyr lie 
160 165 170 

GCC TTC AGG AAG GTG AAG CAG ATG CAC CTT AAC TTC ATG GAT GAA AAG 577 
Ala Phe Arg Lys Val Lys Gin Met His Leu Asn Phe Met Asp Glu Lys 
175 180 185 
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GAT CGG ATC ATC ACA GCC TTC TAC AAC GAA GCC CGG GCA CGG GCC AGG 625 
Asp Arg lie lie Thr Ala Phe Tyr Asn Glu Ala Arg Ala Arg Ala Arg 
190 195 200 205 

GCC AAC AGA GGC ATC CTT CAG CAG GAG CGA CAA CGG CTA GGG CAG CGG 673 
Ala Asn Arg Gly lie Leu Gin Gin Glu Arg Gin Arg Leu Gly Gin Arg 
210 215 220 

CAG CCG CCA CCA TCC GAG CCA ACC CAA GGC CCC GAG ATC GTG CCC CGG 721 
Gin Pro Pro Pro Ser Glu Pro Thr Gin Gly Pro Glu lie Val Pro Arg 
225 230 235 

GGC GCC GGC CCC TGA GGGGCTC ACCTGGATGG GGCCTGCAGT GCGTTCCCGC 773 
Gly Ala Gly Pro * 
240 

TTTGCTTCCT TCCCTGGACG GCCCGCTCCC CGAAACGCGC GCAATAAAGT GATTCGCAG 832 

(2) INFORMATION FOR SEQ ID NO: 9: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 241 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 

<xi) SEQUENCE DESCRIPTION: SEQ ID NO : 9 : 

Met Pro Pro Leu Leu Pro Leu Arg Leu Cys Arg Leu Trp Pro Arg Asn 
15 10 15 

Pro Pro Ser Arg Leu Leu Gly Ala Ala Ala Gly Gin Arg Ser Arg Pro 
20 25 30 

Ser Thr Tyr Tyr Glu Leu Leu Gly Val His Pro Gly Ala Ser Thr Glu 
35 40 45 

Glu Val Lys Arg Ala Phe Phe Ser Lys Ser Lys Glu Leu His Pro Asp 
SO 55 60 

Arg Asp Pro Gly Asn Pro Ser Leu His Ser Arg Phe Val Glu Leu Ser 
65 70 75 80 

Glu Ala Tyr Arg Val Leu Ser Arg Glu Gin Ser Arg Arg Ser Tyr Asp 
85 90 95 

Asp Gin Leu Arg Ser Gly Ser Pro Pro Lys Ser Pro Arg Thr Thr Val 
100 105 110 

His Asp Lys Ser Ala His Gin Thr His Ser Ser Trp Thr Pro Pro Asn 
115 120 125 

Ala Gin Tyr Trp Ser Gin Phe His Ser Val Arg Pro Gin Gly Pro Gin 
130 135 140 

Leu Arg Gin Gin Gin His Lys Gin Asn Lys Gin Val Leu Gly Tyr Cys 
145 150 155 160 

Leu Leu Leu Met Leu Ala Gly Met Gly Leu His Tyr lie Ala Phe Arg 
165 170 175 

Lys Val Lys Gin Met His Leu Asn Phe Met Asp Glu Lys Asp Arg lie 
180 185 190 

He Thr Ala Phe Tyr Asn Glu Ala Arg Ala Arg Ala Arg Ala Asn Arg 
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195 200 205 



Gly lie Leu Gin Gin Glu Arg Gin Arg Leu Gly Gin Arg Gin Pro Pro 
210 215 220 

Pro Ser Glu Pro Thr Gin Gly Pro Glu lie Val Pro Arg Gly Ala Gly 
225 230 235 240 

Pro 



SEQ ID Nos: 10-18 25-36 



(2) INFORMATION FOR SEQ ID NO: 7: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 300 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 

(ix) FEATURE: 

(A) NAME /KEY : CDS 

(B) LOCATION: 170.. 300 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 7: 

CGATTTCATT CCTCGCTCCC CACAGGTCCC TCTCCCCAAA ATATTCCCAT CTTGTCCTAG 60 

CCCATCCCCC AGACTATCTC AAGGACCAGC TGTCCCCACG CCCCCGACCT CCACTAGGCC 120 

TGTGCCACCC GCTGCCTGCA GGAAGACGCC CGGTCCCGGG CCGGGTTAG CCC CAT 175 

Pro His 
1 

GGG AAC GGG GTT CGG TCC GAG CCC GGT GGG AGG CTC CCG GAG CGC AGC 223 
Gly Asn Gly Val Arg Ser Glu Pro Gly Gly Arg Leu Pro Glu Arg Ser 
5 10 15 

CTG GGC CCA GCC CAC CCC GCG CCG GCG GCC ATG GCA GGC ACC CTG GAC 271 
Leu Gly Pro Ala His Pro Ala Pro Ala Ala Met Ala Gly Thr Leu Asp 
20 25 30 

CTG GAC AAG GGC TGC ACG GTG GAG GAG CT 300 
Leu Asp Lys Gly Cys Thr Val Glu Glu Leu 
35 40 



(2) INFORMATION FOR SEQ ID NO: 8: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 44 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 



(ii) MOLECULE TYPE: protein 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 8: 

Pro His Gly Asn Gly Val Arg Ser Glu Pro Gly Gly Arg Leu Pro Glu 
15 10 15 

Arg Ser Leu Gly Pro Ala His Pro Ala Pro Ala Ala Met Ala Gly Thr 
20 25 30 

Leu Asp Leu Asp Lys Gly Cys Thr Val Glu Glu Leu 
35 40 



(2) INFORMATION FOR SEQ ID NO: 9: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 15 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 9 : 
GGGATCCCCC TGGTC 



(2) INFORMATION FOR SEQ ID NO: 10: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 13 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: Peptide 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 10: 

Asp Val Asp Glu Glu Asp Glu Val Glu Asp lie Glu Phe 
15 10 

(2) INFORMATION FOR SEQ ID NO: 11: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 13 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: Peptide 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 11: 

Asp Val Asp Gly Asp Gly His lie Ser Gin Glu Glu Phe 
1 5 10 

(2) INFORMATION FOR SEQ ID NO: 12: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 13 amino acids 
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(B) TYPE: amino acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: Peptide 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 12: 

Asp His Asp Arg Asp Gly Phe lie Ser Gin Glu Glu Phe 
15 10 

(2) INFORMATION FOR SEQ ID NO: 13: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 13 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: Peptide 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 13: 

Asp Gin Asn Gin Asp Gly Cys He Ser Arg Glu Glu Met 
15 10 

(2) INFORMATION FOR SEQ ID NO: 14: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 13 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: Peptide 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 14: 

Asp Val Asp Met Asp Gly Gin He Ser Lys Asp Glu Leu 
15 10 

(2) INFORMATION FOR SEQ ID NO: 15: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 37 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: Peptide 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 15: 

His Phe Val His Val Ala Glu Lys Leu Leu Gin Leu Gin Asn Phe Asn 
1 5 10 15 

Thr Leu Met Ala Val Val Gly Gly Leu Ser His Ser Ser He Ser Arg 
20 25 30 
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Leu Lys Glu Thr His 
35 

(2) INFORMATION FOR SEQ ID NO: 16: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 37 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: Peptide 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 16: 

Lys Phe Val His Val Ala Lys His Leu Arg Lys lie Asn Asn Phe Asn 
15 10 15 

Thr Leu Met Ser Val Val Gly Gly lie Thr His Ser Ser Val Ala Arg 
20 25 30 

Leu Ala Lys Thr Tyr 
35 

(2) INFORMATION FOR SEQ ID NO: 17: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 50 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: Peptide 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 17: 

His Asn Phe Gin Glu Ser Asn Ser Leu Arg Pro Val Ala Cys Arg His 
15 10 15 

Cys Lys Ala Leu lie Leu Gly lie Tyr Lys Gin Gly Leu Lys Cys Arg 
20 25 30 

Ala Cys Gly Val Asn Cys His Lys Gin Cys Lys Asp Arg Leu Ser Val 
35 40 45 

Glu Cys 
50 

(2) INFORMATION FOR SEQ ID NO: 18: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 50 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: Peptide 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 18: 

Kis Asn Phe His Glu Thr Thr Phe Leu Thr Pro Thr Thr Cys Asn His 
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15 10 15 

Cys Asn Lys Leu Leu Trp Gly lie Leu Arg Gin Gly Phe Lys Cys Lys 
20 25 30 

Asp Cys Gly Leu Ala Val His Ser Cys Cys Lys Ser Asn Ala Val Ala 
35 40 45 

Glu Cys 
50 



(2) INFORMATION FOR SEQ ID NO: 19: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 15 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 19: 
GGGATCCCCC TGGTC 15 
(2) INFORMATION FOR SEQ ID NO: 20: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 20: 
GAATTCGGCA CGAGCCGACG G 21 



(2) INFORMATION FOR SEQ ID NO: 21: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 78 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 21: 
ATGGAGCAGA AGCTGATCTC CGAGGAGGAC CTGCCCGGGG CAGCTGGATC CGCAGCCCAC 60 
CCCGCGCCGG CGGCCATG 78 
(2) INFORMATION FOR SEQ ID NO: 22: 
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(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 26 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : single 
<D> TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 



(xi) SEQUENCE DESCRIPTION; SEQ ID NO:22: 

Met Glu Gin Lys Leu lie Ser Glu Glu Asp Leu Pro Gly Ala Ala Gly 
15 10 15 

Ser Ala Ala His Pro Ala Pro Ala Ala Met 
20 25 



(2) INFORMATION FOR SEQ ID NO: 23: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 33 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:23: 
GGATCCGCAG CCCACCCCGC GCCGGCGGCC ATG 



(2) INFORMATION FOR SEQ ID NO: 24: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 11 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 24: 



Gly Ser Ala Ala His Pro Ala Pro Ala Ala Met 
5 10 

(2) INFORMATION FOR SEQ ID NO: 25: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 25 
GGACAAAGTG TGTGATGAAC C 
(2) INFORMATION FOR SEQ ID NO: 26: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 26 
CTCATCCTCC GTCTGATACT G 
(2) INFORMATION FOR SEQ ID NO: 27: 

<i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 
<D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 27 
GTAGATGTGG ATCAGCTTGG 
(2) INFORMATION FOR SEQ ID NO: 28: . 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 19 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 28 
AGGTGGAGAA TGGTCAAGG 
(2) INFORMATION FOR SEQ ID NO: 29: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2 0 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE : DNA 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 29 
GTCATAGTCT GTCTCCTACT 
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(2) INFORMATION FOR SEQ ID NO: 30: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 30 
ACATAGACAG CGTGCCTACC 
(2) INFORMATION FOR SEQ ID NO: 31: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 31 
TACAACCTTA GGGACACCAG 
(2) INFORMATION FOR SEQ ID NO: 32: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 32 
TGCTGAGCCT GCTCACGGTG 
(2) INFORMATION FOR SEQ ID NO: 33: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 18 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 33 
CAAGTGAACA GCACGTCC 
(2) INFORMATION FOR SEQ ID NO: 34: 
(i) SEQUENCE CHARACTERISTICS: 
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( A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 34 
GACTATCTCA AGGACCAGCT G 
(2) INFORMATION FOR SEQ ID NO: 35: 

(i) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 18 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 35 
GGTTCGGTCC GAGCCCGG 



(2) INFORMATION FOR SEQ ID NO: 36: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 36 
GGAGCGATAC TCCAAGTAGG T 



(2) INFORMATION FOR SEQ ID NO: 37: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 18 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 37 
AGCGGGCCAG GCCCCTTC 



(2) INFORMATION FOR SEQ ID NO: 38: 



WO 98/53061 



-83- 



(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 3 8 
CATCCTGGTC CAATGCGCTC 



(2) INFORMATION FOR SEQ ID NO: 39: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 22 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 39 
GCACTGAGGA AGTTAAACGA GC 



(2) INFORMATION FOR SEQ ID NO: 40: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 22 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 40 
GCTCGTTTAA CTTCCTCAGT GC 
(2) INFORMATION FOR SEQ ID NO: 41: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 41 



GCTCAGCTCC ACAAAGCGGC T 
(2) INFORMATION FOR SEQ ID NO: 42: 
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(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 19 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 42 



ACCAGCTCCG CTCAGGTAG 
(2) INFORMATION FOR SEQ ID NO: 43: 

(i) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 43 



TCCAGGAGCT GTGTGTTTGG 



(2) INFORMATION FOR SEQ ID NO: 44; 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 19 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 44 



CCAGTTTCAC AGCGTGAGG 



(2) INFORMATION FOR SEQ ID NO: 45: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 19 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 45 



CAGCATGAGG AGGAGGCAG 
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CLAIMS: 

1 . An isolated nucleic acid molecule comprising a sequence of nucleotides encoding or 
complementary to a sequence encoding an amino acid sequence having homology to a regulator 
of gene expression or a derivative of said gene regulator. 

2. An isolated nucleic acid molecule according to claim 1 wherein the regulator 
comprises a zinc finger domain of an (HC 3 ) 2 type. 

3. An isolated nucleic acid molecule according to claim 2 wherein the sequence of 
nucleotides or complementary sequence of nucleotides is selected from: 

(i) a nucleotide sequence set forth in SEQ ID NO:2; 

(ii) a nucleotide sequence encoding an amino acid sequence set forth in SEQ ID NO:3; 

(iii) a nucleotide sequence having at least about 40% similarity to the nucleotide sequence 
of (i) or (ii); and 

(iv) a nucleotide sequence capable of hybridising under low stringency conditions to the 
nucleotide sequence set forth in (i), (ii) or (iii). 

4. An isolated nucleic acid molecule according to claim 1 wherein said gene regulator is 
a guanine nucleotide exchange factor (GEF) or a derivative thereof. 

5. An isolated nucleic acid molecule according to claim 4 wherein the sequence of 
nucleotides is selected from: 

(i) a nucleotide sequence set forth in SEQ ID NO:4 or 6; 

(ii) a nucleotide sequence encoding an amino acid sequence set forth in SEQ ID NO:5 or 
7; 

(iii) a nucleotide sequence having at least about 40% similarity to the nucleotide sequence 
of (i) or (ii); and 

(iv) a nucleotide sequence capable of hybridising under low stringency conditions to the 
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nucleotide sequence set forth in (i), (ii) or (iii). 

6. An isolated nucleic acid molecule according to claim 1, wherein said gene regulator 
is a heat shock protein or is a heat shock binding protein or a derivative thereof. 

7. An isolated nucleic acid molecule according to claim 6, wherein the sequence of 
nucleotides is selected from: 

(i) a nucleotide sequence set forth in SEQ ID NO:8; 

(ii) a nucleotide sequence encoding an amino acid sequence set forth in SEQ ID NO:9; 

(iii) a nucleotide sequence having at least about 40% similarity to the nucleotide sequence 
of (i) or (ii); and 

(iv) a nucleotide sequence capable of hybridising under low stringency conditions to the 
nucleotide sequence set forth in (i), (ii) or (iii). 

8. A genetic construct comprising a vector portion and a gene portion comprising a 
regulator of gene expression or a derivative thereof . 

9. A genetic construct according to claim 8 wherein the gene portion comprises a zinc 
finger domain of (HC 3 ) 2 type. 

10. A genetic construct according to claim 9 wherein the gene portion comprises a 
nucleotide sequence selected from: 

(i) a nucleotide sequence set forth in SEQ ED NO:2; 

(ii) a nucleotide sequence encoding an amino acid sequence set forth in SEQ ID NO:3; 

(iii) a nucleotide sequence having at least about 40% similarity to the nucleotide sequence 
of (i) or (ii); and 

(iv) a nucleotide sequence capable of hybridising under low stringency conditions to the 
nucleotide sequence set forth in (i), (ii) or (iii). 
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11. A genetic construct according to claim 8 wherein said gene portion is a nucleotide 
exchange factor (GEF) or derivative thereof. 

12. A genetic construct according to claim 11 wherein the gene portion comprises a 
nucleotide sequence selected from: 



(i) a nucleotide sequence set forth in SEQ ID NO:4 or 6; 

(ii) a nucleotide sequence encoding an amino acid sequence set forth in SEQ ID NO:5 or 
7; 

(iii) a nucleotide sequence having at least about 40% similarity to the nucleotide sequence 
of (i) or (ii); and 

(iv) a nucleotide sequence capable of hybridising under low stringency conditions to the 
nucleotide sequence set forth in (i), (ii) or (iii). 



13. A genetic construct according to claim 8 wherein the gene portion is a heat shock 
protein or a derivative thereof or a heat shock binding protein or derivative thereof. 

14. A genetic construct according to claim 13 wherein the gene portion comprises a 
nucleotide sequence selected from: 



(i) a nucleotide sequence set forth in SEQ ID NO:8; 

(ii) a nucleotide sequence encoding an amino acid sequence set forth in SEQ ID NO:9; 

(iii) a nucleotide sequence having at least about 40% similarity to the nucleotide sequence 
of (i) or (ii); and 

(iv) a nucleotide sequence capable of hybridising under low stringency conditions to the 
nucleotide sequence set forth in (i), (ii) or (iii). 



15. A nucleic acid molecule encoding a gene regulator having the identifying 
characteristics of a molecule selected from MCG4, MCG7 and MCG18 having respective amino 
acid sequences of SEQ ID NO:3, SEQ ID NO: 5 or 7 and SEQ ID NO:9. 



WO 98/53061 



PCT/AU98/00380 



-88- 

16. A method of detecting a condition caused or facilitated by an aberration in mcg4 t said 
method comprising determining the presence of a single or multiple nucleotide substitution, 
deletion and/or addition or other aberration to one or both alleles of said mcg4 wherein the 
presence of such a nucleotide substitution, deletion and/or addition or other aberration may be 
indicative of said condition or a propensity to develop said condition. 

17. A method of detecting a condition caused or facilitated by an aberration in mcg4, said 
method comprising screening for a single or multiple amino acid substitution, deletion and/or 
addition to MCG4 wherein the presence of such a mutation is indicative of or a propensity to 
develop said condition. 

18. A method for detecting MCG4 or a derivative thereof in a biological sample said 
method comprising contacting said biological sample with an antibody specific for MCG4 or its 
derivatives or homologues for a time and under conditions sufficient for an antibody-MCG4 
complex to form, and then detecting said complex. 

19. A method of detecting a condition caused or facilitated by an aberration in mcg7, said 
method comprising determining the presence of a single or multiple nucleotide substitution, 
deletion and/or addition or other aberration to one or both alleles of said mcg7 wherein the 
presence of such a nucleotide substitution, deletion and/or addition or other aberration may be 
indicative of said condition or a propensity to develop said condition. 

20. A method of detecting a condition caused or facilitated by an aberration in mcg7, said 
method comprising screening for a single or multiple amino acid substitution, deletion and/or 
addition to MCG7 wherein the presence of such a mutation is indicative of or a propensity to 
develop said condition. 

21. A method for detecting MCG7 or a derivative thereof in a biological sample said 
method comprising contacting said biological sample with an antibody specific for MCG7 or its 
derivatives or homologues for a time and under conditions sufficient for an antibody-MCG7 
complex to form, and then detecting said complex. 
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22. A method of detecting a condition caused or facilitated by an aberration in meg 18, said 
method comprising determining the presence of a single or multiple nucleotide substitution, 
deletion and/or addition or other aberration to one or both alleles of said mcgl8 wherein the 
presence of such a nucleotide substitution, deletion and/or addition or other aberration may be 
indicative of said condition or a propensity to develop said condition. 

23. A method of detecting a condition caused or facilitated by an aberration in meg 18, said 
method comprising screening for a single or multiple amino acid substitution, deletion and/or 
addition to MCG18 wherein the presence of such a mutation is indicative of or a propensity to 
develop said condition. 

24. A method for detecting MCG18 or a derivative thereof in a biological sample said 
method comprising contacting said biological sample with an antibody specific for MCG18 or 
its derivatives or homologues for a time and under conditions sufficient for an antibody-MCG18 
complex to form, and then detecting said complex. 
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FIGURE 1 



TCAGTAAACA CAGAGACTGG GGATCGATC ATG GGG CTT TGT AAG TGC CCC AAG 5 3 

Met Gly Leu Cys Lys Cys Pro Lys 
1 5 

AGA AAG GTG ACC AAC CTG TTC TGC TTC GAA CAT CGG GTC AAC GTC TGC 101 
Arg Lys Val Thr Asn Leu Phe Cys Phe Glu His Arg Val Asn Val Cys 
10 15 20 

GAG CAC TGC CTG GTA GCC AAT CAC GCC AAG TGC ATC GTC CAG TCC TAC 14 9 

Glu His Cys Leu Val Ala Asn His Ala Lys Cys He Val Gin Ser Tyr 
25 30 35 40 

CTG CAA TGG CTC CAA GAT AGC GAC TAC AAC CCC AAT TGC CGC CTG TGC 197 
Leu Gin Trp Leu Gin Asp Ser Asp Tyr Asn Pro Asn Cys Arg Leu Cys 
45 50 55 

AAC ATA CCC CTG GCC AGC CGA GAG ACG ACC CGC CTT GTC TGC TAT GAT 24 5 

Asn He Pro Leu Ala Ser Arg Glu Thr Thr Arg Leu Val Cys Tyr Asp 
60 65 70 

CTC TTT CAC TGG GCC TGC CTC AAT GAA CGT GCT GCC CAG CTA CCC CGA 2 93 

Leu Phe His Trp Ala Cys Leu Asn Glu Arg Ala Ala Gin Leu Pro Arg 
75 80 85 

AAC ACG GCA CCT GCC GGC TAT CAG TGC CCC AGC TGC AAT GGC CCC ATC 341 
Asn Thr Ala Pro Ala Gly Tyr Gin Cys Pro Ser Cys Asn Gly Pro He 
90 95 100 

TTC CCC CCA ACC AAC CTG GCT GGC CCC GTG GCC TCC GCA CTG AGA GAG 38 9 

Phe Pro Pro Thr Asn Leu Ala Gly Pro Val Ala Ser Ala Leu Arg Glu 
105 110 115 120 

AAG CTG GCC ACA GTC AAC TGG GCC CGG GCA GGA CTG GGC CTC CCT CTG 43 7 

Lys Leu Ala Thr Val Asn Trp Ala Arg Ala Gly Leu Gly Leu Pro Leu 
125 130 135 

ATC GAT GAG GTG GTG AGC CCA GAG CCC GAG CCC CTC AAC ACG TCT GAC 48 5 

He Asp Glu Val Val Ser Pro Glu Pro Glu Pro Leu Asn Thr Ser Asp 
140 145 150 

TTC TCT GAC TGG TCT AGT TTT AAT GCC AGC AGT ACC CCT GGA CCA GAG 53 3 

Phe Ser Asp Trp Ser Ser Phe Asn Ala Ser Ser Thr Pro Gly Pro Glu 
155 160 165 

GAG GTA GAC AGC GCC TCT GCT GCC CCA GCC TTC TAC AGC CGA GCC CCC 581 
Glu Val Asp Ser Ala Ser Ala Ala Pro Ala Phe Tyr Ser Arg Ala Pro 
170 175 180 

CGG CCC CCA GCT TCC CCA GGC CGG CCC GAG CAG CAC ACA GTG ATC CAC 629 
Arg Pro Pro Ala Ser Pro Gly Arg Pro Glu Gin His Thr Val He His 
185 190 195 200 



ATG GGC AAT CCT GAG CCC TTG ACT CAC GCC CCT AGG AAG GTG TAT GAT 
Met Gly Asn Pro Glu Pro Leu Thr His Ala Pro Arg Lys Val Tyr Asp 
205 210 215 



677 
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ACG CGG 3 AT GAT _ rt C CGG ACA CCA GGC CTC CAT GGA GAC T. Z.-.2 GAT - ' 
Thr Arg Asp Asp Asp Arg Thr Pro Gly Leu His Giy Asp Cys Asp Asp 
220 225 23C 

GAC AAG TAC CGA CGT CGG CCG GCC TTG GGT TGG CTG GCC CGG CTG CTA 773 
Asp Lys Tyr Arg Arg Arg Pro Ala Leu Gly Trp Leu Ala Arg Leu Leu 
235 240 245 

AGG AGC CGG GCT GGG TCT CGG AAG CGG CCG CTG ACC CTG CTC CAG CGG 821 
Arg Ser Arg Ala Gly Ser Arg Lys Arg Pro Leu Thr Leu Leu Gin Arg 
250 255 260 

GCG GGG CTG CTG CTA CTC TTG GGA CTG CTG GGC TTC CTG <~-CC CTC CTT 86 9 

Ala Gly Leu Leu Leu Leu Leu Gly Leu Leu Gly Phe Leu Ala Leu Leu 
265 270 275 280 

GCC CTC ATG TCT CGC CTA GGC CGG GCC GCA GCT GAC AGC GAT CCC AAC 917 
Ala Leu Met Ser Arg Leu Gly Arg Ala Ala Ala Asp Ser Asp Pro Asn 
285 290 295 

CTG GAC CCA CTC ATG AAC CCT CAC ATC CGC GTG GGC CCC TCC TGA 96 2 

Leu Asp Pro Leu Met Asn Pro His lie Arg Val Gly Pro Ser 
300 305 310 

GCCCCCTTGC TTGTGGCTAG GCCAGCCTAG GATGTGGGTT CTGTGGAGGA GAGGCGGGGT 102 2 

AATGGGGAGG CTGAGGGCAC CTCTTCACTG CCCCTCTCCC TCAAGCCTAA GACACTAAGA 1082 

CCCCAGACCC AAAGCCAAGT CCACCAGAGT GGCTCGCAGG CCAGGCCTGG AGTCCCCGTG 1142 

GGTCAAGCAT TTGTCTTGAC TTGCTTTCTC CCGGGTCTCC AGCCTCCGAC CCCTCGCCCC 1202 

ATGAAGGAGC TGGCAGGTGG AAATAAACAA CAACTTTATT 124 2 
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Figure 2 

gb| AA15S210 |AA155210 mr98e01.rl Stracagene mouse enbryonic rare mcma 
(<937317> Mas musculus cCNA clcne 605496 5' 

Query: 1 MGLCKC PKRXVTT1I^CF^!HRVWCT!HCLV^/HAKC ryQSYLCWLQDSDY>TFNCRLCNI PL 60 

MGLC KC P KRKVTNLFCFEHRVNVC EHC L VANHAKC WQ SYLQWLQ DS DYN FNCRLCN PL 
Sbjct: 98 ^LCKCPKRXVTNLFCFTHRVWCCTCLVANHAKC 2 n 1 

Figure 3 

dbj |D75913|CELK111G3F C.elegans cONA clone yklllg3 : 5' end. single read. 

Query: 7 PKRKVT^^CFraRVWCE^CLVA^WAKC rVQSrLQWLQDSDYNPNTRLCNI P LAS RETT 66 

PKRKVTNLF *EHRVNVCE LV NH OVQSYL WL D DY+PNC LC L *T 
SbjcC: 1- PKWCVTNLFXV^HRVNVCELXLVCNH FNCWQSYLTWLTDQD YD PNCSLCXTTUCT3GDT I 180 

Query: 67 RLVCYDLFHWACLNERAAQLPRNTAPAGYQCP 98 98 PSCNGPIFPPNQ 109 

RL C L HW C *E P TAP GY*CP P C* *FPP*Q 

Sbjct: 131 RLNCLHLLHWKC FDEWXGNF P DTTAPXGYRC P 276 275 PCCSQEVFPPDQ 310 
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Figure 4 




Figure 5 



sp|P46580|YLB5_CAEHX HYPOTHETICAL 146.8 KD PROTEIN C34E10.5 IN 

CHROMOSOME III gi | 500728 (U10402) C34E10.5 gene product 
[Caenorhabditis elegans ) 

Query: 56 ™iPIJ>£RCTTRLVCYT>l™^ 100 

C+I L ♦ L C LF W Cf E A ♦ + ♦ *CP C 

Sbjct: 1222 CSICI^IJKNPSALFCXIHI^CVmriQEJiAVAATSSASTSSARCPQC 1266 



Figure 6 

gi (703468 (L29051) homologous to GATA-binding transcription factor 
[Schizosaccharomyces pombe) 

Query: 35 C rVQSYLQWLQDSDYNPNCRI>CNI 58 

C ♦ +W *D MP C C + 
Sbjct: 175 CATTNTPKWRRDESGNP ICNACGL 198 

Query: 162 SSTPGPEZVDSASAAPAFTSQAPRPPASPGRPEQtnVIHM3^ 221 

+S PEE S S S P* SP ♦ +Q «-I P +V ♦ D 

Sbjct: 441 ASU^EEPPSNSDKQPSMSNGPKSEVSPSQSQQAPLIQSSTSPVSLQFPPEVQGSN^/DK 500 



Query 
Sbjct 



222 RTPGLH 227 

R L+ 
501 RNYALN 506 
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gb|AA074703|AA074703 zm76g07.rl Stratagene neuroepit helium (#9372311 
Homo sapiens cDNA clone 531612 5' 
Length =417 

Plus Strand HSPs: 

Score = 81B (226.0 bits), Expect = 6.1e-103, Sum P(5> = 6.1e-103 
Identities = 206/259 (79%), Positives = 206/259 (79%). Strand = Plus / Plus 

Query: 446 (XXXTCCCTCITiATXrGATG 505 

II IIIIIIIIIHIIIIIIilll I lllllfllllillllllllllll I II III 
Sbjct: 19 QQGCTCCCICn^TCGArGA£X7X^ 108 

Query: 506 TTCTCTC^CTOGTCTAOTTTTAAT^ 565 

iiiiiiii inn ii iiiiiiiiii ii iii iii i iiiiiii ii mi 

Sbjct: 109 TTCTCTCATTGGTCCAGCTTTAATG^ 168 
Query: 566 GCCTCTCXTTGCCCCAGCCTT^ 625 

i i mi ii ii inn iiiiiiii inn ii ii i mm mi 

Sbjct: 169 ACTCCATCTtXJVCCTCOT 228 
Query: 626 CCCGAGCACX^ACAGTGA^ 685 

mmimmim n iimm i i m; n n i mum m 

Sbjct: 229 CCCGAGCAGCACACAGTCAT^ 288 

Query: 686 AAGGTCTATGATACGCGGG 704 

II II Hill II I II 
Sbjct: 289 AAAGTATATCACACACCGG 307 

Score = 230 (63.6 bits), Expect ~ 6.1e-103, Sum P(5) = 6.1e-103 
Identities = 50/55 (90%), Positives = 50/55 (90%), Strand - Plus / Plus 

Query: 398 GCACIXSAGAGtfUGAAGCTQGOCACAGTC^ 452 

iiirimm Mill 1 1 1 f I M 1 1 1 i 1 1 M 1 1 1 f 1 1 1 1 1 1 1 1 1 1 1 1 1 II 

Sbjct: 2 QCACICftGAGWJtfCTAt^^ 56 

Score = 175 (48.4 bits). Expect = 6.1e-103, Sum P(5) = 6.1e-103 
Identities = 39/44 (88%), Positives = 39/44 (88%), Strand = Plus / Plus 

Query: 767 GCCTOXXnTOXriCCXXCGCX^^ 810 

Sbjct- 373 ILJiLL!!^^ 416 

Score = 139 (38.4 bits). Expect - 6.1e-103, Sum P(5) = 6.1e-103 
Identities = 31/35 (88%), Positives = 31/35 (88%), Strand = Plus / Plus 

Query: 731 QGAGACTCnGACGATGACAAGTACCGACGTCGGCC 765 

iiiiimiii iiiiiiii inn ii mil 

Sbjct: 336 GGAGACTGTGAITIATGACAAATACCGCCOCCGGCC 370 

Score = 133 (36.8 bits). Expect = 6.1e-103, Sura P(5) = S.le-103 
Identities = 29/32 (90%), Positives = 29/32 (90%) , Strand = Plus / Plus 



Query: 
Sbjct: 



701 CGGGATGATCACOGGACACCAGGCCTCCATGG 732 

imimmmmi urn i tint 

305 CC?GGATGATGACCGGACAGCAGOCATTCATGG 336 



WO 98/53061 

Figure 8 continued 



7/32 



PCT/AU98/00380 



gb|AA134786 |AA134788 zm81g02.rl Stratagene neuroepi thelium (#937231) 
Homo sapiens cDNA clone 532082 5* 
Length =3 68 

Plus Strand HSPs; 

Score = 563 (155.6 bits), Expect = 3.8e-87, Sum P(3) = 3.8e-87 

Identities = 147/190 (77%), Positives = 147/190 (77%), Strand = Plus / Plus 



Query 
Sbjct 
Query 
Sbjct 
Query 
Sbjct 
Query 
Sbjct 



49B OTKTKIACTTCTCTOU^ 557 

I II HIIIIIIIII IIIII II llllllllll II III III I lllllll 

103 CCTCAGACTIKTTCTCATTGGTCCAGC^ 162 

558 TAGACAGCGCCTCTGCTOTC^^ 617 

II UN I I IIII II I I IIII I IIIIIIII IIIII II II I IIII 
163 GAGCCAGCACTCCATCTGCQC^ 222 

618 CAGGCCCX3CCO»GCAGCA^ 677 

II Nil IMIIIMIIIIIIIII II IIIIIIII I I IIII 1 1 II I MM 

223 CAAGCCCTCCCGAGCAGCACA 282 

678 CCCCTAGGAA 687 

MM Mill 
283 CCCCAAGGAA 292 



Score = 454 (125.4 bits), Expect = 3.8e-87, SumP(3) = 3.8e-87 
Identities = 94/98 (95%), Positives = 94/98 (95%), Strand = Plus / Plus 

Query: 398 GCACTGAGAGAGAAGCTCGCCACAGTCAAro 457 

sbjct- 2 (LyL^u^ 61 

Query: 458 ATCGATCAOGTOGTGAOCCCAGLWSCCC^ 495 

minium 1 iiiiiiiimiiiiiiiimi 

Sbjct: 62 ATCGATGAGGTGATAAGCCO^^ 99 

Score = 219 (60.5 bits), Expect = 3.8e-87, Sum P(3) = 3.8e-87 
Identities = 51/60 (85%), Positives - 51/60 (85%), Strand - Plus / Plus 

Query: 702 GGGATCATC^CCGGACACCAGGCCTC 761 

11 iiiiiiiiiiiii 11111 1 imiMiimii iiiiiiii iiiii 11 1 

Sbjct: 309 QGATTGATGACCGGACAGCAGGCATT^ 368 



Figure 9 

W32939 human TALUJOAJTllXXSAACXa^^ 

AA242159 mouse L"inXXGCXX.U'lunxjATTACCGTACGCACCGGTCA~ CGATCOGCATCXSCOC^OC^TCGGTCA 
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MCG4 MCLC3CCPKRK VTOLFCFEHR VNVCEHCLVA NHAKCIVQSY LQWLQDSDYN PNCRLCNIPL 60 

MCG4 ASRETTRLVC YDLFHWACLN ERAAQLPRNT APAGYQCPSC NGPIFPPTNL AGPVASALRE 120 

3. 

[ 229 J ***x> 

5. 

[ 74 ] *.**> 

130 140 150 160 170 180 

• * * • * 

MCG4 KLATVNWARA GLGLPLIDEV VSPEPEPLWT SDFSDWSSFN ASSTPGPEEV DSASAAPAFY 

1. 20 30 40 50 60 

[ 372 J ^•••***** 5 »#****#*»* * tc * SV q#« r a *tps*****> 

2. 30 40 50 60 

[ 243 J aqs*s*sip *tt*svq**r a*tps*****> 

*P 

I 

3. 10 20 30 40 50 60 | 

[ 229 ] •*»••*«*** j^.. 5 xr n*ivql* chhhlcarge sqh*icac*l> 



s| s 

II I 
5. 10 J | 30 40 50 60 j 

£ 7 4 j ***»** x *». •«»» smr .* a q*» s *- s ipq tslig-pal- irppp*lcJcrr ep*lhlxlli> 

190 200 210 220 230 240 

R • 

MCG4 skAPRPPASP GRPEQHTVTH M3NPEPLTHA PRKVYDTRDD DRTPGLHCDC DODKYRRRPA 

* * 

I 

1. . 70 80 90 100 | 110 120 

[ 372 J *\***#*p*« s «*»*##*»* «*st*a*a* # *******pgp *srhswetvn mtnt-aagl*> 

2. V 70 80 90 

C 243 ] *y»*#* p »* s **######* **st*a*a** ***> 

* i 

3. 70 80 90 100 j 110 120 

( 229 ] gsp*sslpk* s*a-a*sht* gey*s*g*r- *kek*m*hg* *** a *i # **» ♦*****•> 

4. 70 80 90 100 110 120 

( 86 J p*sslpk* s*a-a*sht* gey*s*g*rp kesi*h # gnm tgqqafm*** *********c> 

h 
I 

5. 70 80 90 100 110 

t 74 ) arl'allppq av*sstqsyt w*vlk*w-*t *qgk*m**** ***a*i**> 

I 

6. jlOO 

1 38 1 *t * q *******> 

250 260 270 280 290 300 

* * * * * * 

MCG4 U3WLARLLRS RAGSRKRPLT LLQRAGLLLL ALMSRLGBAA ADSDPNLDPL 

1. 130 
( 372 ] q > 

4. 

t 86 ] s*-**> 

310 
* 

MCG4 MNPHIKVGPS 
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Search Analysis for Sequence: MCG4 
Search from 1 to 310 
Date: September 22,1997 

Aligned sequences: 

1. = EST AA074703 phase 1 translation 

2. = EST AA134788 phase 3 translation 
3 . a EST AA134788 phase 2 translation 

4. = EST AA074703 phase 3 translation 

5 . = EST AA074703 phase 2 translation 

6. = EST AA134788 phase 1 translation 



Matrix: pam2S0 n\atrix 
Score Region from 1 to 310 
Maxirrajm possible score: 1598 
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FIGURE 11 Domains of MCG4 

leucine 

zinc finger acidic zipper basic 




16 100 138 171 234 241 269 288 



zinc finger consensus: CX 2 HX 4 CX 2 CX 4 HX 2 CX l7 CX 2 CX lg HX 2 CX lg CX 2 C 
acidic domain consensus: 9/34 negatively charged amino acids, 0/34 positively charged 
basic domain consensus: 13/55 positively charged amino acids, 0/55 negatively charged 
leucine zipper domain consensus: LX^LX 6 RX fi LX 6 L 

alternate "novel" leucine zipper-like motif where leucine would not be aligned along 
the one surface of an alpha helix domain: (aa 261) LX LXLX LXLX L (aa 286) 
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FIGURE 12 

Smallest 
Sum 







High 


Probability 


Sequences producing High-scoring Segment Pairs: 


Score 


P(N) 




N 


gnl|PlD|e236178 


(Z70752) F25B3.3 [Caenorhabditis ele. . . 


307 


3 


.0e 


-124 


8 


gi |1293099 


(U53884) aimless RasGEF [Dictyosteli . . . 


202 


7 


.8e 


-22 




gi |1655941 


(U67326) Ras-GRF2 [Mus tnusculus] 


152 


3 


.6e 


-16 


4 


pir| |S30356 


CDC25 protein homolog - yeast (Candi... 


150 


2 


.2e- 


-15 


3 


sp | P43069 | CC25_CANAL 


CELL DIVISION COOTROL PROTEIN 25 


150 


2 


.2e- 


-15 


3 


sp|P28818|OJRP_RAT 


GUANINE NUCLEOTIDE RELEASING PROTEIN. . . 


166 


2 


.6e- 


-15 


3 


pr£| (1814463A 


guanine nucleotide- releasing factor . . . 


166 


~i. 


. 6e- 


-15 




pirj |B46199 


nucleotide- exchange -fact or homolog c. . . 


167 


1 


.le- 


-14 




gnl|PID|e238680 


(X97560) hypothetical protein L1309 ... 


158 


3 


.Oe- 


-14 


3 


pirj |S22693 


CDC25 protein homolog - mouse /gi|50... 


167 


3 


-7e- 


-14 




sp | P14771 1 SC2 5_YEAST 


SCD25 PROTEIN /gi|457494 (M26647) SD. . . 


158 


4 


.6e- 


-14 


3 


sp | P26674 | STE6_SCHPO 


STE6 PROTEIN /pir||S28098 ste6 prote. . . 


160 


5 


.2e- 


-14 


2 


pir| (S28407 


CDC25 protein homolog - mouse 


167 


1 


2e- 


-13 


3 


sp | P27671 | GNRP_MOUSE 


GUANINE NUCLEOTIDE RELEASING PROTEIN. - . 


167 


1 


2e- 


•13 


3 


gij 386047 


(S62035) Ras-specific guanine nucleo... 


153 


2 


Oe- 


13 


2 


sp | Q0 2 3 42 | CC2 5_SACKL 


CELL DIVISION CONTROL PROTEIN 25 /pi , . . 


142 


4 


5e- 


13 


2 


pir| |S14177 


SCD25 protein - yeast ( Sacchar ornyces . . . 


152 


5 


7e- 


13 


3 


gi (433720 


(L26584) CDC25 [Homo sapiens] 


153 


6 


Oe- 


1 "» 
1J 




gnl|PID|e241744 


(Z688B0) T14G10.2 [Caenorhabditis el... 


157 


7 


2e- 


13 




gi|3484 


(X03579) CDC25 protein (aa 1-1588) {... 


136 


3. 


4e- 


12 




sp | P04821 | CC2S_YEAST 


CELL DIVISION COOTROL PROTEIN 25 /pi . . . 


136 


3. 


4e- 


12 


- 


gi|915328 


(U24070) Muncl3-1 (Rattus norvegicus] 


151 


5. 


5e- 


12 




pir| |A46199 


nucleotide- exchange- factor homolog c... 


149 


5. 


6e- 


12 




pdbj 1PTR| 


Molecule: Protein Kinase C Delta Ty. . . 


136 


1. 


Se- 


11 




gi|915330 


(U24071) Muncl3-2 [Rattus norvegicusl 


150 


1. 


6e- 


11 


- 


gij 474982 


(D21239) 'C3G protein' [Homo sapiens . 


131 


3. 


3e- 


11 




gij 1763306 


(U75361) Muncl3-3 [Rattus norvegicus] 


153 


6. 


4e- 


11 


2 


gi |806957 


guanine-nucleotide exchange factor C. . . 


128 


7. 


8e- 


11 




sp | Q03 3 8 5 | OlDS _>DUSE 


GUANINE NUCLEOTIDE DISSOCIATION STIM. . . 


133 


1. 


Oe- 


10 




pir| (BVHYL1 


LTE1 protein - yeast (Saccharomyces . . . 


139 


1. 


9e- 


10 




gi |452242 


(D21354) a putative guanine nucleoti... 


139 


2. 


7e- 


10 




sp | P07866 | LTElJYEAST 


LOW TEMPERATURE ESSENTIAL PROTEIN /p. . . 


139 


2. 


7e- 


10 




gi|S09050 


(Z22521) protein kinase C delta [Horn... 


137 


4.0e- 


10 




gij 520587 


(D10495) protein kinase C delta-type... 


137 


4 


6e- 


10 




sp j P05130 | KPCl„DROME 


PROTEIN KINASE C. BRAIN ISOZYME (PKC. . . 


137 


4 


7e- 


10 




pir| |S35704 


protein kinase C (EC 2.7.1.-) delta ... 


137 


4 


7e- 


10 




sp |Q05655 | KPCD_WMAN 


PROTEIN KINASE C, DELTA TYPE (NPKC-D. . . 


137 


4 


7e- 


10 




pir| |S40279 


protein kinase C mu - human /pir| |A5. . . 


137 


4 


9e- 


10 




sp|P09215|KPCD_RAT 


PROTEIN KINASE C, DELTA TYPE (NPKC-D. . . 


135 


9 


.0e- 


-10 




gi|52087e 


(Z34524) serine/threonine protein ki... 


133 


1 


.8e- 


-09 




gi|1519719 


(U68142) RalGDS-like (Homo sapiens] 


115 


3 


.8e- 


-09 
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FIGURE 13(a) (i) 

MCG7 - Cloning of a novel human gene that encodes a guanine exchange factor 

CGATTTCATTCCTCGCTCCCCACAGGTCCCTCTCCCCAAAATATTCCCATCTTGTCCTAG 6 0 
IS FLAPHRS LS PKYSHLVL 19 
CCCATCCCCCAGACTATCTCAAGGACCAGCTGTCCCCACGCCCCCGACCTCCACTAGGCC 120 
AHP PDYLKDQLS PRPRP PLG 39 
TGTGCCACCCGCTGCCTGCAGGAAGACGCCCGGTCCCGGGCCGGGTTAGCCCCATGGGAA 180 
LCHPLPAGRRPVPGRVSPMG 59 
CGcagcgcctgtgtggccgcgggactcaaggctggcctggctcaagtgaacagcacgtcc 240 
TQRLCGRGTQGWPGSSEQHV 79 
aggaggcgacctcgtccgcgggtttgcattctggggtggacgagctggGGGTTCGGTCCG 3 00 
QEATSSAGLHSGVDELGVRS >T 
AGCCCGGTGGGAGGCTCCCGGAGCGCAGCCTGGGCCCAGCCCACCCCGCGCCGGCGGCCA" 360 
EPGGRLPERS LGPAHPAPAA 119 
TGGCAGGCACCCTGGACCTGGACAAGGGCTGCACGGTGGAGGAGCTGCTCCGCGGGTGCA 420 
MAGTLDLDKG.CTVEELLRGC 139 
TCGAAGCCTTCGATGACTCCGGGAAGGTGCGGGACCCGCAGCTGGTGCGCATGTTCCTCA 4 80 
IEAFDDSGKVRDPQLVRMFL 159 
TGATGCACCCCTGGTACATCCCCTCCTCTCAGCTGGCGGCCAAGCTGCTCCACATCTACC 540 
HMHPWYI PS SQLAAKLLHIY 179 
AACAATCCCGGAAGGACAACTCCAATTCCCTGCAGGTGAAAACGTGCCACCTGGTCAGGT 600 
QQS R KDNSKS LQVKTCHLVR 199 
ACTGGATCTCCGCCTTCCCAGCGGAGTTTGACTTGAACCCGGAGTTGGCTGAGCAGATCA 660 
YWISAFPAEFDLNPELAEQI 219 
AGGAGCTGAAGGCTCTGCTAGACCAAGAAGGGAACCGACGGCACAGCAGCCTAATCGACA 720 
KELKALLDQEGNRRHSSLID 239 
TAGACAGCGTCCCTACCTACAAGTGGAAGCGGCAGGTGACTCAGCGGAACCCTGTGGGAC 780 
I DSV PTYKWKRQVTQRNPVG 259 
AGAAAAAGCGCAAGATGTCCCTGTTGTTTGACCACCTGGAGCCCATGGAGCTGGCGGAGC 840 
QKKRKMSLLFDHLEPMELAE 279 
ATCTCACCTACTTGGAGTATCGCTCCTTCTGC^AGATCCTGTTTCAGGACTATCACAGTT 900 
HLTYLEYRS FCK ILFQDYHS 299 
TCGTGACTCATGGCTGCACTGTGGACAACCCCGTCCTGGAGCGGTTCATCrCCCTCTTCA 960 
FVTHGCTVDNPVLERFISLF 319 
ACAGCGTCTCACAGTGGGTGCAGCTCATGATCCTCAGCAAACCCAC AGCCCCGCAGCGGG 1020 
NSVS QWVQLMILSKPTAPQR 339 
CCCTGGTCATCACACACTTTGTCCACGTGGCGGAGAAGCTGCTACAGCTGCAGAACrTCA 1080 
A L V I TH FVHVAE KLLQLQNF 359 
ACACGCTGATGGCAGTGGTCGGGGGCCTGAGCCACAGCTCCATCTCCCGCCTCAAGGAGA 1140 
NTLMAVVGGLSHSSX.SRLKE 379 
CCCACAGCCACGTTAGCCCTGAGACCATCAAGCTCTGGGAGGGTCTCACGGAACTAGTGA 1200 
THS HVS PET I KLWEGLTELV 399 
CGGCGACAGGCAACTATGGCAACTACCGGCGTCGGCTGGCAGCCTGTGTGGGCTTCCGCT 1260 
TATGNYGNYRRRLAACVGFR 419 
TCCCGATCCTX3GGTGTGCACCTCAAGGACCTGGTGGCCCTGCAGCTGGCACTGCCTGACT 1320 
FP I LGVHLKDLVALQLALPD 439 
GGCTGGACCCAGCCCGGACCCGGCTCAACGGGGCCAAGATGAAGCAGCTCTTTAGCATCC 1380 
WLDPARTRLNGAKMKQLFSI 459 
TGGAGGAGCTGGCCATGGTGACCAGCCTGCGGCCACCAGTACAGGCCAACCCCGACCTGC 1440 
LEE LAMVTS LRP PVQANPDL 479 
r TGAGCCTGCTCACGGTGTCTCTGGATCAGTATCAGACGGAGGATGAGCrGTACCAGCTGT 1500 
LSLLTVSLDQYQTEDELYQL 499 
CCCTGCAGCC<X?AGCCGCGCTCCAAGTCCTCGC 1560 
SLQREPRSKSSPTSPTSCTP 519 
CACC C CGGC C C C CGGT ACTGG AGG AG TGG AC CT CGG CTG C CAAAC C CAAGCTGGATCAGG 1620 
PPRP PVLEEWTSAAKPKLDQ 539 
CCCTCGTGGTGGAGCACATCGAGAAGATGGTGGAGTCTGTGTT^ 1^80 
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FIGURE 13(a) (ii) 



ALVVEH IEKMVESVFRNFDV 559 
ATGGGGATGGCCACATCTCACAGGAAGAATTCCAGATCATCCGTGGGAACrrCCCTTACC 1740 
DGDGKIS QEEFQI I R G N F P Y 579 
TCAGCGCCTTTGGGGACCTCGACCAGAACCAGGATGGCTGCATCAGCAGGGAGGAGATGG 1800 
LSAFGDLDQNQDGCI SREBM 599 
TTTCCTATTTCCTGCGCTCCAGCTCTGTGTTGG 1860 
V SYFLRSSSVLGGRMGFVHK €19 
TCCAGGAGAGCAACTCCTTGCGCCCCGTCGCCTGCCGCCACTGCAAAGCCCTGATCCTGG 1920 
FQESNSLRPVACRHCKALIL 639 
GCATCTACAAGCAGGGCCTCAAATGCCGAGCCTGTGGAGTGAACTGCCACAAGCAGTGCA 198 0 
GIYKQGLKCRACGVNCHKQC 659 
AGGATCGCCTGTCAGTTGAGTGTCGGCGCAGGGCCCAGAGTGTGAGCCTGGAGGGGTCTG 2040 
KDR LSVECRRRAQ S~VS LEGS 679 
CACCCTCACCCTCACCCATGCACAGCCACCATCACCGCGCCTTCAGCTTCTCTCTGCCCC 2100 
APS PSPMHS HHHRAFS FSLP 699 
GCCCTGGCAGGCGAGGCTCCAGGCCTCCAGAGATCCGTGAGGAGGAGGTACAGACGGTGG 2 160 
RPG RRGS R P PE I REEEV QTV 719 
AGGATGGGGTGTTTGACATCCACTTGTAATAGATGCTGTOT 2220 
EDGV FD I H L * r%v 728 
CTGCCTTGG AGAAAAT ACTTCAAC CAGAG CAGGGAGCCTGGGGGTGTCGGGG CAGG AGGC 2280 
TGGGGATGGGGGTGGGATATGAGGGTGGCATGCAGCTGAGGGCAGGGCCAGGGCTGGTGT 2340 
CCCTAAGGTTGTACAGACTCTTGTGAATATTTGTA1TTTCCAGATGGAATAAAAAGGCCC 2400 



GTGTAATTAACCTTC (A) n 
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FIGURE 13(b) 



CGATTTCATTCCTCGCTCCCCACAGGTCCCrrCTCCCCAAAATATTCCCATCTTGTCCTAG 6 0 
CCCATCCCCCAGACTATCTC^GGACaVGCTGTCCCCACGCCCCCGACCrrCCACTAGGCC 120 
TGTGCCACCCGCTGCCTGCAGGAAGACGCCCGGTCCCGGGCCGGGTTAGCCCCATGGGAA £8 0 * 

* p h g n 

CGGGGTTCGGTCCGAGCCCGGTGGGAGGCTCCCGGAGCGCAGCCTGGGCCCAGCCCACCC*2r^O 
gvrsepggrlperslgpahp 

CGCGCCGGCGGCCATGGCAGGCACCCTGGACCTGGACAAGGGCTGCACGGTGGAGGAGCT*^^^ 
a p a a MAG T L D - L D KG C T V E E L 
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FIGURE 14 



1 MAGTLDLDKGC. . . TVEELLRGCIEAF . . DDSGKVRDPQLVRMFLMMHPW 45 

|.:.:: |. I : = |: hi- I |:.:.| | ::| ::: |.| 

1 MSSKVEEDQHOELLTEDQLVARCVECF DVDEEDEVEDIEFV DALFLSHOW 50 

46 YIPSSQLAAKLLHIYQQSRKDNSNSLQVKTCHLVRYWISAFPAEFDLNPE 95 

- -I | ..::::( |:.|. : . | : | . : | |. | | . | I . | = 

51 LSDSLSLITHFVNFYQETRNVEQRE . . . AVCRAVSFWIEKFPMHFDAQPQ 97 

96 LAEOIKELKALLDQEGNRRHSSLIDIDSVPTYKWKRQVTQRNPVGQKK. . 143 

:..|: ||.: :: :.:| |:..:|.: | |.|. |||::... 

98 VCAQWRLKTIAEDINENIRNGL . DVSALPSFAWLRAVSVRNPLAKQTIV 146 

144 RKMSLLFDHLEPMELAEHLTYLEYR 168 

:|! I : -I |.-::|| 
147 RVDFETLPTPGTPPPFPIASKKFSLTAFSLSFVQASPSDISTSLSHIDYR 196 

169 SFCKILFQDYHSFVTHGCTVDNPVLERFISLFNSVSQWVQLMILSKPTAP 218 

:::| : :.. :|..| . hill ||:||.:|.||| I I I - I - I = - 
197 VLSRI S ITELKQYVKDGHLRSCPMLERS I SVFNNLSNWVQCMI LNKTT PK 246 

219 QRALVIT HFVHVAEKLLQLONFNTLMAWGGLSHSSISRLKETH SHVSPE 268 

:|| ::..|||||-.| . | | | | | | . | | | | : . || | :. | | .|.. :| = 
247 ERAEILV KFVHVAKHLRKINWFNTLMSWCMITHSWARIAKTYA VLSND 296 

269 TIKLWEGLTELVTATGNYGNYRRRLAAC . VGFRFPILGVHLKDLVALQLA 317 

• | =-.|I:|:.| l = .:|h 1 = 11 I I : I I = I I I I I I I I I = • * 
297 I KKELTQLTNLLSAQHNFCEYRKALGACNKKFRI PIIGVHLKDLVAINCS 340 

318 LPDWLDPARTRLNGAKMKQLFSILEELAMVTSLRPPV . QANPDLLSLLTV 366 

::: . : . : . | : . | . : | . : : : . . . : : | ||:. |.| 

347 GANFEKT. . KCISSDKLVKLSKLLSNFLVFNQKGHNLPEMNMDLINTLKV 394 

367 SLDQYQTEDELYQLSLQREPRSKSSPTSPTSCTPPPRPPVLEEWTSAAKP 416 

III .:|::|:||Ulh. • . | . | . | : . | | . | : . . . 

395 SLDIRYNDDDIYELSLRREPKTFMN FEPSRGLVFAEWASGVTV 437 

417 KLDOALVVEHIEKMVESV FRNF DVDGijGHI SQEEFO I IRGNFr YLSAFGD 466 

|.| I -||. I| = -I|:: = l I II II III lh I lllh = -ll- = 
438 A PDNATVS KH I S AMVDA VFKH YDHDRDG F t SO EEFQ L I AGNF PF I DAFVN 487 

467 L DQNQDGC T SREEMV SYFLRSS . SVLGGRMGFV HNFOESNSLRPVACRHC 515 

:| : 11 ||::|: -lh : Ml |||:|.. I I - - I - I I 

488 I DVDMDGQ I S KDEL KTYFMAANKNTKDLRRGFK HNFHETTFLT PTT CNHC 537 

516 KALII.GT YKOGLKCRACGVNCHKOCKDRLSVEC RRRAQSVSLEGSAPSPS 565 

. |::||.:|hlh.|l = -.|- II-- -lllh- t : l 
5 3 8 NKLLWGTr^RQGFKCKDCGLAVHSCCKSMAVAECRRKSSSNLTRAAEWFAS 587 

566 PMHSHHHRAFSFSLPRPGRRGSRPPEIREEEVQTVEDGVFDIHL 609 

|. I : I : .-.MM * ..|:.. | -| 

588 PRGSMRSRIINTC NNSGSTPDEEIGLVSLACEEVFEDDDL 627 
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FIGURE 15 



human CGATTTCATT CCTCGCTCCC CACAGGTCCC TCTCCCCAAA ATATTCCCAT CTTGTCCTAG 60 

human CCCATCCCCC AGACTATCTC AAGGACCAGC TGTCCCCACG CCCCCGACCT CCACTAOGCC 120 

human TGTGCCACCC GCTGCCTGCA GGAAGACGCC CGGTCCCGGG CCOGGTTAGC CCCATGGGAA 180 

human CGCAGCGCCT GTGTGGCCGC GGGACTCAAG GCTGGCCTGG CTCAAGTGAA CAOCACGTCC 240 
mouse ***tcag** ****ag**** t ********* *** a *g***t> 

human AOGAGGCGAC CTCGTCCGCG GGTTTGCATT CTOGGGTOGA CGAGCTGOGG GTTCOGTCCG 300 

acagg 

i 
i 

mouse g****»t**a **-*catt* # ********** *** aa ** aa * g**ct**"** **a**aat**> 

human AGCCCGGTGG GAGGCTCCCG GAGCGCAGCC TGGGCCCAGC CCACCCCGCG CCGGCGGCC& 360 

mouse *** a * t **#» ******* tgaL *** t * t * a * t **** t * t *** ***-*tg**a ***** a ****> 

human TQQCAGGCAC CCTGGACCTG GACAAGGGCT GCACGGTGGA GGAGCTGCTC CGCGGGTGCA 420 

mouse **** ga **** £*••**#**• ******** t * **** c ***** ********** *» t ** c ** t * > 

human TCGAAGCCTT CGATGACTCC GGGAAOGTGC GGGACCCGCA GCTGGTOCGC ATGTTCCTCA 480 

mouse ********** t *«****»* t ** a ******* * a **t*»a** *** a ****** *****t****> 

human TGATGCACCC CTGGTACATC CCCTCCTCTC AGCTCGCGGC OUU3CTGCTC CACATCTACC 540 

mouse •*♦*#*•*** ********* a ** t ******* ******* tt * g ** a ****** *** t *«** t * > 

human AACAATCCCG GAAGGACAAC TCCAATTCCC TGCAGGTGAA AACGTOCCAC CTGGTCAOGT 600 

mouse *g******** ********** ******** t * « a *** a **«* ****** t *** t ****«**«* > 

human ACTGGATCTC CGCCTTCCCA GCGGAGTTTG ACTTGAACCC GGAGTTGGCT GAGCAGATCA 660 

mouse *•*••**♦** a ********* ** a »»*** c * ****«•••** a **« c ****« ** a ****««* > 

human AGGAGCTGAA GGCTCTGCTA GACCAAGAAG GGAACCGACG GCACAGCAGC CTAATCGACA 720 

mouse ********** *****•*(.*« ********** **«**«* ca * ********** *« c *#***** > 

human TAGACAGCGT 730 

mouse * c *«g** t ** 
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FIGURE 16 



CACGCCTCGGAAGGGAGGTTTGGGGTCGGTGGTTTCACAGTGAGTGTGTCTGAAGCCAAA 60 
TGGTCGGAAACCGTTACCCGCTCTCCTAGGCCCGGCTAGTGGGGACCCCAACCGCCTGCG 120 

* ARLVGTPTAC> 
GCTGCCCCTCCCAAGTTCCTCCCTGTTGGCCAGGCATCCAGGTCTCCAGTCTCCGAGCTG 180 
GCPSQVPPCWPGIQVSSLRA> 
CGGAGAACCCACCGCCACATGCGGCTGCCCCTTTCCATTCGACCCTGTGGGGAGCCAGGC 240 
AENPPPHA AAPFHSTLWGAR> 
TTCCGGGGCCCCGTTCCTCCTGTGTGAACTGGGCCCCCCGCCCCCATTCCCAGACATCAA 300 
LPGPRSSCVNWAPRPHSQTS> 
GGCCGCGTCTCCAGATAGCCACGATTTCATTCCTCGCTCCCCACAGGTCCCTCTCCCCAA 3 6 0 
RPRLQIATISFLAPHRSLS P> 
AATATTCCCATCTTGTCCTAGCCCATCC "CCAGACTATCTCAAGGACCAGCTGTCCCCAC 4 2 0 
KYSHLVLAHPPDYLKDQLS P> 
GCCCCCGACCTCCACTAGGCCTGTGCCACCCGCTGCCTGCAGGAAGACGCCCGGTCCCGG 48 0 
RPRPPLGLCHPLPAGRRPVP> 
GCCGGGTTAGCCCCATGGGAACGcagcgcctgtgtggccgcgggactcaaggctggcctg 54 0 
* p h g n 

GRVS PMGTQRLCGRGTQGWP> 
gctcaagtgaacagcacgtccaggaggcgacctcgtccgcgggtttgcattctggggtgg 600 
GS S E QHVQEATS S AG LHS GV> 
acgagc tggGGGTTCGGTCCGAGCCCGGTGGGAGGCTCCCGGAGCGCAGCCTGGGCCCAG 66 0 
DELGVRSEPGGRLPERSLGP> 
CCCACCCCGCGCCGGCGGCC^TGGCAGGC^CCCTGGACCTGGACAAGGGCTGCACGGTGG 720 
AH PAPAAMAGTLDLD KGCT V> 
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FIGURE 18 (Cont. I 



Smal/Apal (both lost) 0.00 




,pal/Smal (both lost) 1.00 



Plasmid name: clone 16 in pGEX-3X 
Plasmid size: 6.00 kb 
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FIGURE 18 (Cont. II) 



EcoRi 0.00 




Plasmld name: clone 19 in pGEX-1 
Plasmid size: 6.00 kb 
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Hindlll 2.50 



Ptasmld name: clone 5 in pGEM-11zf 
Plasmid size: 5.50 kb 
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FIGURE 18 ( Cont - IV> 



,(BamHI) 0.00 



GEF domain 





Hindlll (lost)/Smal (lost) 2.40 



t 



Plasmid name: clone 27 in pGEX-2T 
Ptasmid size: 7.50 kb 
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FIGURE 19 



GCCCGCCGCC ATG CCG CCC TTA CTG CCC CTG CGC CTG TGC CGG CTG TGG 49 
Met Pro Pro Leu Leu Pro Leu Arg Leu Cys Arg Leu Trp 
1 5 10 

CCC CGC AAC CCT CCC TCC CGG CTC CTC GGA GCG GCC GCC GGG CAG CGG 97 
Pro Arg Asn Pro Pro Ser Arg Leu Leu Gly Ala Ala Ala Gly Gin Arg 
15 20 25 

TCC AGA CCC AGT ACT TAT TAT GAA CTG TTG GGG GTG CAT CCT GGT GCC 14 5 

Ser Arg Pro Ser Thr Tyr Tyr Glu Leu Leu Gly Val His Pro Gly Ala 
30 35 40 45 

AGC ACT GAG GAA GTT AAA CGA GCT TTC TTC TCC AAG TCC AAA GAG CTG 193 
Ser Thr Glu Glu Val Lys Arg Ala Phe Phe Ser Lys Ser Lys Glu Leu 
50 * 55 60 

CAC CCA GAC CGG GAC CCT GGG AAC CCA AGC CTG CAC AGC CGC TTT GTG 241 
His Pro Asp Arg Asp Pro Gly Asn Pro Ser Leu His Ser Arg Phe Val 
65 70 75 

GAG CTG AGC GAG GCA TAC CGT GTG CTC AGC CGT GAG CAG AGC CGC CGC 289 
Glu Leu Ser Glu Ala Tyr Arg Val Leu Ser Arg Glu Gin Ser Arg Arg 
80 85 90 

AGC TAT GAT GAC CAG CTC CGC TCA GGT AGT CCC CCA AAG TCT CCA CGA 337 
Ser Tyr Asp Asp Gin Leu Arg Ser Gly Ser Pro Pro Lys Ser Pro Arg 
95 100 105 

ACC ACA GTC CAT GAC AAG TCT GCC CAC CAA ACA CAC AGC TCC TGG ACA 3 85 

Thr Thr Val His Asp Lys Ser Ala His Gin Thr His Ser Ser Trp Thr 
110 115 120 125 

CCC CCC AAC GCA CAG TAC TGG TCC CAG TTT CAC AGC GTG AGG CCA CAG 4 33 

Pro Pro Asn Ala Gin Tyr Trp Ser Gin Phe His Ser Val Arg Pro Gin 
130 135 140 

GGG CCC CAG TTG AGG CAG CAG CAA CAC AAA CAA AAC AAA CAA GTG CTG 481 
Gly Pro Gin Leu Arg Gin Gin Gin His Lys Gin Asn Lys Gin Val Leu 
145 150 155 

GGG TAC TGC CTC CTC CTC ATG CTG GCG GGC ATG GGC CTG CAC TAC ATT 529 
Gly Tyr Cys Leu Leu Leu Met Leu Ala Gly Met Gly Leu His Tyr lie 
160 165 170 

GCC TTC AGG AAG GTG AAG CAG ATG CAC CTT AAC TTC ATG GAT GAA AAG 577 
Ala Phe Arg Lys Val Lys Gin Met His Leu Asn Phe Met Asp Glu Lys 
175 180 185 

GAT CGG ATC ATC ACA GCC TTC TAC AAC GAA GCC CGG GCA CGG GCC AGG 625 
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FIGURE 19 (cont* led) 

Asp Arg lie He Thr Ala Phe Tyr Asn Glu Ala Arg Ala Arg Ala Arg 
190 195 200 205 

GCC AAC AG A GGC ATC CTT CAG CAG GAG CGA CAA CGG CTA GGG CAG CGG 67 3 

Ala :.sn Arg Gly He Leu Gin Gin Glu Arg Gin Arg Leu Gly Gin Arg 
210 215 220 

CAG CCG CCA CCA TCC GAG CCA ACC CAA GGC CCC GAG ATC GTG CCC CGG 721 
Gin Pro Pro Pro Ser Glu Pro Thr Gin Gly Pro Glu He Val Pro Arg 
225 230 235 

GGC GCC GGC CCC TGA GGGGCTC ACCTGGATGG GGCCTGCAGT GCGTTCCCGC 773 
Gly Ala Gly Pro * 
240 

TTTGCTTCCT TCCCTGGACG GCCCGCTCCC CGAAACGCGC GCAATAAAGT GATTCGCAG 832 
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FIGURE 20 



>sp|PC8622|CNAJ_ECOLI DNAJ PROTEIN >pir | (HHECOJ heat shock prccein dnaJ - 
Escherichia coli >gi| 145769 (M12565 i -heat shock protein dnaJ 
[Escherichia colil >gi| 216441 (D1Q483) dnaJ protein [Escherichia 
coli] 

Length = 376 

Score = 138 (63.7 bits). Expect = 1.2e-10. P = 1.2e-10 
Identities = 25/62 (40%), Positives = 39/62 (62%) 

Query: 3 5 YYEIXGVKPGASTEEVKRAFFSKSKEIiff 94 

YYE+LGV A E+**A* ♦ ♦ HPDR* G+ +«-F E* EAY VL+ Q R ♦ 
Sbjct: 6 YYEIU^SKTAEEREIRKAYTOU^HKTHPCRNQGE^^ €5 

Query: 95 YD 96 
YD 

Sbjct: 66 YD 67 



WO 98/53061 

FIGURE 21 



26/32 



PCT/AU98/00380 



>gi 1 1703590 (U80439) contains similarity co <?. DNAJ-like domain [Caenorhabditis 
elegans J 
Length - 345 

Score = 98 (45.2 bitr) , Expect = 5.2e-12. Sum P(3> = 5.2e-12 
Identities = 17/37 (45%). Positives = 28/37 (75%) 

Query: 28 QRSRPSTYYEIXGvKPGA^TEEVraAFFSKSKELHPD 64 

R TVYE+LGV A* E+K AF+++SK++KPD 
Sbjct: 22 KXIRQRTHYEVLGVESTATLSEIKSAFYAQSKKVHPD 58 

Score = 74 (34.1 bits). Expect = S.2e-12, Sum P(3) = 5.2e-12 
Identities = 17/32 (53%), Positives = 19/32 (59%) 

Query: 71 SIJfSRFVELSEAYRVLSREQSRRSYDDQLRSG 102 

S + F*EL AY VL R RR YD QLR C 
Sbjct: 64 SATASFLELKNAYDVLRRPADRRLYDYQLRGG 95 

Score = 39 (18.0 bits). Expect = 5.2e-12. Sum P(3) = 5.2e-12 
Identities = 10/42 (23%), Positives = 19/42 (45%l 

Query: 162 LLMLACMjLKYIAFRXVKQMHIJ^FMDEKDRI ITAFYNEARAR 203 

L+++AG Y + Q L+ ♦ + *D I F ♦ R 

Sbjct: 158 LVLVAGYNGGYL YLLAYNQKQLDKL I DQ3E IAKCFLRQKEFR 199 



>gnl|PID|e28l266 (Z81030) C01G10.12 [Caenorhabdit is elegans) 
Length = 191 

Score = 96 «44.3 bits). Expect = 1.8e-09. Sum P(3) = 1.8e-09 
Identities = 17/41 (41%), Positives = 27/41 (65%) 

Query: 35 YYELJjGVHPGASTEEVKRAFFSKSKELHPDRD PGNP S UiSR 75 

YYE++GV A* +E++ AF K+K+LHPD+ ♦ SR 
Sbjct: 19 YYE I IGVSASATRQE IRDAFLKKTKQLHPDQSRKSSKSDSR 59 

Score = 54 (24.9 bits). Expect = 1.8e-09, Sum P(3) = 1.8e-09 
Identities = 10/22 (45%), Positives = 15/22 (68%) 

Query: 75 RJFVELSEAYKVLSREQSRFSYD 96 

♦F* ♦ EAY VL E+ R* YD 
Sbjct: 71 QFMLVTCEAYtT>/LP^IZEKKKEYD 92 

Score = 35 (16.1 bits). Expect = 1.8e-09, Sum P(3) = 1.8e-09 
Identities = 9/44 (20%), Positives = 22/44 (50%) 

Query: 141 QGPQIJ*QQQHKQNKQVUrrciXiU4^ 184 

+ P+ ♦ KQ ++L +<3 * + RK++ 

Sbjct: 145 RNPEDEYLRKQKNRMLVVIAATVMALIGANTvYIRKLQADRLS 188 
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FIGURE 22 



>Sp | Ql 02 09 | YAYl_SCHPO HYPOTHETICAL 44.8 KD PROTEIN C4H3.01 IN CHROMOSOME I 
>gi | 1194014 (269380) unknown f Schizosacchaxomyces pombel 
Length =392 

Score = 84 (38.8 bits). Expect = 4.1e-08. Sum P(3i = 4.1e-08 
Identities = 13/36 (36%). Positives = 25/ 36 (69% t 

Query: 35 YYZLLGVHPGASTEEVKRAFFSKSKELHPDFDPGNP 70 



Score = 64 {29.5 bits). Expect = 4.1e-08. Sum P(3) = 4.1e-08 
Identities = 14/40 (35%), Positives = 23/40 (57%) 

Query: 75 R5VEL S EAYKVLS REQ SRRSYDDQLRSG5 P PKS P P.TTVKD 114 

*F +*SEAY+VL E+ R YD + ♦ P«- T *D 
Sbjct: 50 KFQK I SEAYQVLCDEKLRSQYDQFGKEKAVPEQGFTDAYD 89 

Score = 37 (17.1 bits), Expect = 4.1e-08, Sum P(3) = 4.1e-08 
Identities = 9/29 (31%), Positives = 15/29 (51%) 

Query: 190 DR 1 1 T AFYNEARARARANRC ILQQERQRL 218 

DR A E A A* «• + RQR+ 
Sbjct: 149 DRKXNAQ I REREALAKREQEM I EDRRQR I 177 

Score = 33 (15.2 bits). Expect = 0.00081, Sum P(3) = 0.00081 
Identities = 8/19 (42%). Positives - 11/19 (57%) 

Query: 140 PQGPQLRQQQKKCNKQVLG 158 

PQG ♦ Q+ ♦ QVLG 
Sbjct: 44 PQGASEKFQKISEAYQVLG 62 



>gnl|PID|e253406 (X77635) tumorous imaginal discs (Drosophila virilisl 

>gnl jPID|e263866 (Y07700) Tid58 protein (Drosophila virilis] 
Length = 529 

Score - 153 (70.6 bits). Expect = 9.7e-13, P = 9.7e-U 
Identities = 27/71 (38%), Positives = 44/71 (61%) 

Query: 26 AGQRSRPSTYYEUjCVHPGASTEEvKRAFFS^ 85 



♦ R ♦ YY LGV A+ *++K«-A+* HPD + +P +F «-«-SEAY V 



Sbjct: 72 S S SRM2 AKDYY ATLGVAKNANAKD IKKAYYELAKKYKPDTNKDDFDASK3<FQDVSEAYEV 131 



Sbjct: 



YY+LLG+ A* ♦♦K+A* * * HPD++P +P 
9 YYDUjGISTI^TAvTIIKKAYRKIAVKYHPDKNPDDP 44 



FIGURE 23 



Query: 86 LSRBQSRRSYD 96 

LS +Q RR YD 
Sbjct: 132 LSDDQKRREYD 142 
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MCG18 MPPLLPLRLCRLWP-RN--PP SRLLGAA 

KDJ - 2 MVKETTYYDVLGVK P^TQEELKKAYRKIALKYHPDKN- - PN EGEKFKQ I SQAYEV 

HDJ- 1 MGKD- - YYQTt/JLARGASDEEIKRAYRRQALRYHPDKNKEPG AEEKFKEIAEAYUV 

HSJ1 M - AS - - YYE I LDVP RSASADDI KXAYRRKALQWHPDKN - - P DNKEFAEKKFKEVAEAYEV 



MCG18 AGQRSRPSTY - - YELLGVH PGA ST-EEVKRAFFS-- 

HDJ- 2 LSDAKXREL YDKGGEQAIK EGGAGGG FGSPMDIFEKFFGGG 

HDJ- 1 LSDPRKREIFDRYGEEGLKG9GP SGGSGGGANCTSFSrTFHGDPHAMFAEFPG- - 

HSJ1 LSDKHIQlErYDRYGRBGLTGTGTGPSRAEAGSGGP- -G- - FTFT - FKSPEEVFREFFG — 



MCG18 KSKELHPDRDPGNP SLHSRFVELSEAYRVLSREJOSRRS- - YDDQLRSGSPPKSPRT 

HDJ-2 GRMQRERRGKN\A/HQLSVTLn3LY*X3ATRKJJu^ 

HDJ-1 GRNPFIHFTGQRNGEEX^IDDPFSGFPM3CGFm^ 

HSJ1 SGDPFAELFDDLGP- - FSELQNRGSRHSGPFFTFSSSFPGHSDFSSSSFSFSPGAGAFRS 



MCG18 TVHDKSAKQTHSSWTPPNAOY WSQFHSVRPQ -GP QLRQQQKKQN 

HDJ-2 TC^QIRIHQIGPCMVQQIQSVCMECQOT^ 

HDJ-1 HDLRVS LEE IYSGCTKKMK ISH-KRLMP— D GKSIRNEDKILTIEVKK 

HSJ1 VSTSTTFVQGRR ITTRR IME NGQ-ERVEVEED GQ LKSVTINGVPD 



MCG1B KQVLGYCLLL KLAGMGI>T^IAFRKVKQKHLWFKDE-KDRI ITAFYNEARARARAN 

HDJ-2 GMKDGQKITFHGEGDQEPGLEPGDI I IVLDQKDHAVFTRRGEDLFMQ1DIQLVEALCGFQ 

HDJ- 1 GWKEGTK ITFPKEGDQTSNNI PADIVFVUCDKPHNIFKRDGSDVIYPARISLREALCGCT 

HSJ1 DLARGLELSR- RE- -QQP- SVTSRSGGTQVQQTPASCPLD- SDLSEDEDLQLAMAYSLSE 



MCG18 RGILQQERQRLGQRQPP-PSEPTQGPEIVPRGAGP 

HDJ-2 " KP I STLENRTIVTTSHPGQ IVF3^DIKCV12JEGOTIYRRPYH(GRL I IEFKVNFPEUGFL 

HDJ- 1 VNVPTLDGRTI PWFK - -DVTRPGMRRKVPGEGLPLPKTPEKRCaDLI IEFEVTFPSI- - 1 

HSJ1 MEAAGKKPAQGREAQHR-RQGRPRPSTKIQAW3GP — RR--VRG — VKQPNAVHPQR-RR 



MCG18 

HDJ-2 SPDKLSXJJESOjLPERKEVEETDQIDQVELVDro 

HDJ-1 PQTSRTVLEQVLPI 

HSJ1 PLAASSSEHKAQPD LIQILTGGSDSLWEEKRGVS 

WTG18 

HDJ-2 QTS 

HDJ-1 

HSJ1 



* = amino acid identity in all 4 proteins 
. - conservative substitution 
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FIGURE 25 



CAAGGAGCCTCTGCCTGCCCGTCGTCGTCATGCCGTCCCTGTTGCTCCAGCTGCCCCTGC 6 0 

MPSLLLQLPL 10 
GCCTATGCCGGCTGTGGCCGCATAGCCTTTCCATCCGACTTCTCACAGCCGCCACAGGGC 120 
RLCRLWPHSLS IRLLTAATG 30 
AGCGGTCTGTCCCTACTAATTACTATGAATTGTTGGGCGTGCATCC 180 
QRSVPTNYYEL LGVHPGASA 50 
AAGAGATTAAACGTGCTTTTTTCACCAAGTCAAAAGAGCTACACCCTGATCGAGACCCTG 240 
E E I KRAFFTK S KELHPDRDP 70 
GGAACCC AGCCCTGCATAGCCGCTTTGTGG AGCTGAATGAGGCATATCG AGTGCTCAGTC 300 
GNPALHSRFVELNEAYRVLS 90 
GTGAGGAAAGTCGTCGTAACTATGACCACCAGCTGCATTCAGCCAGTCCTCCAAAGTCTT 360 
RE ES RRNYDHQ LHSASPPK S 110 
C AGGG AGC AC AGCCGAGCCTAAGTATACGCAACAG ACAC ACAGC AGCTCCTGGGAACCCC 420 
SGSTAEPKYTQQTHSSSWEP 130 
CCAACGCTCAATACTGGGCCCAGTTCCACAGTGTGAGGCCGCAGGGGCCGGAGTCAAGGA 480 
PNAQ.YWAQFHSVRPQGPESR 150 
AGCAGCAGCGTAAACACAACCAGCGGGTCCTCGGGTACTGCCTCCTGCTCATGGTGGCAG 540 
KQQRKHNQRVLGYCLLLMVA 170 
GCATGGGCCTGCACTATGTTGCCTTCAGGAAGCTGGAGCAGGTGCATCGCAGCTTCATTC 600 
GMGLHYVAFRKLEQVHRSFM 190 
ATGAAAAGGACCGGATCATTACAGCCATCTAC AATGACACTCGGGCCAGGGCCAGGGCCA 660 
DE KDRI ITAI YNDTRARARA 210 
ACAGAGCCAGGATTCAGCAGGAGCGCCArGAGAGGCAGCAGCCTCGGGCAGAACCCTCCC 720 
NRAR IQQERHERQQPRAEPS 230 
TGCCTC C AGAAAGCTCC AGGATC ATGCCC C AGGACAC AAGCCCCTGAG AGGCTTAACTAA 780 
LPPESSRIMPQDTSP* 245 
ATGGGACCTTCATTGGTCCTCTCCCTGCTGCCTGTCCAGAACTACACGTGCAATAAACTC 840 
ATTTTCAG ( A) n 849 
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FIGURE 26 



human M0G16 
mouse MOG18 



MPPLL PIJILCSU>WPRNP PSRLIilAAA^ 

MPSLLLQU»IJttX3u^WPHSLSIRLLTAA^ 



#»*#*•*## 



human MCG18 SKELHFDRDPGNPSUiSRFVELSEAYK^ 
mouse MCG18 SKELHPDRDPQfPAlJiSEtfVErilEA^ 



human MCG18 HQTHSS-V/TPniAQYWSQniSVRPQGPQLRQQQH^ 

mouse MCG18 QQTHSSSWEPETOQYV^FHSVRPQGPESRKQQWCH^ 

human MCG18 KVKQMHLNFMDEKDRI ITAFYNEARARARANRGILQQ - - 

mouse MCG18 KUS3QVHRSFMEOCDRI ITAXYNDTRARARANRARIQQER HERQQPRAEPSLPPESSR 



human MDG18 
mouse MCGX8 



XVPRGAGP 
IMPQDTSP 
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FIGURE 27 

ttgaagtctagccccatcctggtccaatgcgctcttggtagcctcctttcccagctgccc 60 
* SLAPSWSNALLVASFPSCP 

gcccgccgccATGCCGCCCTTACTGCCCCTGCGCCTGTGCCGGCTGTGGCCCCGCAACCC 120 
PAAMPPLLPLRLCRLWPRNP> 
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FIGURE 28 
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This International Searching Authority found multiple inventions in this international application, as follows: 

Invention 1, defined by claims 2, 3, 9, 10, 16-18, is to nucleotide sequences, amino acid sequences and proteins with a 
zinc finger domain. 

Invention 2, defined by claims 4, 5, 11, 12, 19-21, is to nucleotide sequences and amino acid sequences and proteins 
which are guanine exchange factors. 

Invention 3, defined by claims 6, 7, 13, 14, 22-24, is to nucleotide sequences and amino acid sequences and proteins 
which are heat shock proteins or heat shock binding proteins. 

As all required additional search fees were timely paid by the applicant, this international search report covers 
all searchable claims 

As all searchable claims could be searched without effort justifying an additional fee, this Authority did not 
invite payment of any additional fee. 

As only some of the required additional search fees were umeiy paid by the applicant this international search 
report covers only those claims for which fees were paid, specifically claims Nos.: 



i. 0 

2 □ 

3 □ 



| | No required additional search fees were timely paid by the applicant. Consequently, this international search 
report is restricted to the invention first mentioned in the claims; it is covered by claims Nos.: 



Remark on Protest | | The additional search fees were accompanied by the applicant's protest. 

|X 1 No protest accompanied the payment of additional search fees. 
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